Conference PaperPDF Available

The Ariadne Debugger: Scalable Application of Event-Based Abstraction.

January 1993

January 1993
28:85-95

DOI:10.1145/174266.174276

Source
DBLP

Conference: Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
Volume: 28

Authors:

Jan Cuny

George Forman

Joydip Kundu

Oracle Corporation

Show all 7 authorsHide

Content uploaded by George Forman

Content may be subject to copyright.

The Ariadne Debugger:

Scalable Application of Event-Based Abstraction

Janice Cuny,

George Forman,

Alfred Hough,

Joydip Kundu,

Calvin Lin,

Lawrence Snyder,

and David Stemple

Revised June 1993

1 Introduction

Massively parallel computations are dicult to debug. Users are often overwhelmed by large amounts of trace

data and confused by the eects of asynchrony. Event-based behavioral abstraction provides a mechanism

for managing the volume of data by allowing users to sp ecify models of intended program behavior that

are automatically compared to actual program behavior [2, 3, 5, 14, 16]. Transformations of logical time

ameliorate the diculties of coping with asynchrony by allowing users to see behavior from a variety of

temporal persp ectives [7, 15, 19, 21]. Previously, we combined these features in a debugger that automatically

constructed animations of user-dened abstract events in logical time [14]. However, our debugger, like

many others, did not always provide sucient feedback nor did it eectively scale up for massive parallelism.

Our modeling language required complex recognition algorithms which precluded informative feedback on

abstractions that did not correspond to observed behavior. Feedback on abstractions that did match behavior

was limited because it relied on graphical animations that did not scale well to even moderate numbers of

processes (such as 64). We address these problems in a new debugger, called Ariadne.

Ariadne uses a simple language to specify behavioral abstractions as patterns of events in logical time.

These patterns are detected in traces of program b ehavior by collections of small nite-state recognizers which

allow substantive feedback on match failures. There are three salient features of Ariadne: The rst is the

ability to provide feedback on failures, the second is the scalability of its patterns and non-graphical output,

and the third is the conciseness of its modeling language. These features, however, are accompanied by some

loss of expressivity. The loss of expressivity means that patterns are often too coarse, matching behaviors

in unintended ways. Ariadne comp ensates for this by providing functions that return the characteristics

of matched behaviors. As an example, a user might match a number of multicasts in an execution trace

and then use functional queries to determine which processes actually wrote values or where those values

were sent. Ariadne's combination of pattern matches and functional queries allows the user to investigate

an execution trace thoroughly.

Section 2 provides an overview of our approach and Section 3 provides several sample debugging

sessions illustrating its capabilities.

2 Our Approach

Ariadne is a

post mortem

debugger for massively parallel, MIMD message-passing systems. It is designed for

correctness debugging, and it supports the user in investigating global interprocess communication patterns.

Ariadne operates on traces produced by parallel programs.

Within these traces, processes, identied

by integer process ids (PIDs), execute sequences of primitive events. The debugger currently supports four

primitive event types: Read, Write, Multicast, and Phase Marker. Reads, Writes, and Multicasts denote

interprocess communication, and Phase Markers denote the ends of logical phases of computation. The

traces are stored internally in an

execution history graph

where the nodes represent events and the edges

Partial support for this work was provided by the Oce of Naval Research under contract N00014-89-J-1368, by the National

Science Foundation under grant CCR-9023256, and by the Defense Advanced Research Projects Agency under DARPA pro ject

DAAL02-91-C-0051.

Department of Computer Science, University of Massachusetts, Amherst, MA 01003

Department of Computer Science and Engineering FR-35, University of Washington, Seattle, WA 98195

Amerinex Articial Intelligence Inc., Amherst, MA 01002

In Greek mythology, Ariadne provided Theseus with the thread that enabled him to nd his way through the Labyrinth to

slay the Minotaur.

Currently our traces are taken from a simulator or generated by hand.

represent communication events; from these traces we can derive Lamport's

happened before

relation [17]. A

number of debuggers provide visualizations of execution history graphs [9, 11, 12, 21] but such visualizations

do not scale well for massively parallel systems. Ariadne allows the user to view the graph but it does not

rely on visualization. Instead, it supports interactive, textual explorations of the graph.

Here we briey describe three aspects of this support: the modeling language, the facilities for manip-

ulating logical temporal orderings, and the functions that are available for investigating the characteristics

of pattern matches and mismatches.

2.1 The Ariadne Mo deling Language

As mentioned above, previous attempts at using event-based abstraction in debugging have been limited

by the complexity of the modeling language. Ariadne's language is quite simple. It employs a three level

description of communication patterns in terms of

chains

p-chains

, and

pt-chains



Chains

are patterns representing \local views" of communication. They are described by extensions

of regular expressions. When they are matched against an execution history graph, all events in the

chain must occur exactly in the order specied with the exception of communication events that are

not physically realized b ecause of \edge eects" on process array boundaries [1].



p-Chains

are patterns representing the concurrent execution of a chain by a set of processes. They are

described by binding a chain to a process set. When a p-chain is matched against a b ehavior, a copy

of its chain is matched on each element of its process set (events can be shared across, but not within,

chains).



pt-Chains

are patterns representing the logical, temporal composition of a set of p-chains. They are

matched in a two step process: rst events matching the p-chains are located in the graph and then

the specied logical relations between those events are veried. When a pt-chain has been successfully

matched, it returns an

abstract event

which is a structure containing the matched instances of events;

these instances are removed from the trace and are unavailable for further matching unless explicitly

restored.

These three denitional levels appear to form a natural mechanism for describing parallel systems,

as evidenced by their use in other contexts such as the XYZ levels of Phase Abstractions [24] and the

LaRCS specication language [22]. The matching algorithm for our language is straightforward: a pt-chain

is recognized by a nite state machine that invokes copies of other nite state machines to recognize chains

on specic pro cesses. This matching can be done eciently, avoiding the costliness of pattern matching

approaches [3, 13] and the expensive implementations of previous languages [16]. At the same time, our

matching algorithm can provide precise information on the reasons for a match failure.

2.2 Logical Time Manipulation in Ariadne

Programmers are often confused by the results of asynchronous executions because they can not foresee all

possible interleavings of events. In fact, most of these interleavings are irrelevant: the programmer do es

not care about the arbitrary orderings of events by physical time. Instead, the programmer is concerned

with the

logical ordering of events

. At the primitive event level, this ordering is captured by Lamport's

happened before

relation. Other debuggers have used temporal logic to express assertions about

happened

before

[6, 10] but they do not use behavioral abstraction. We extend the relation to abstract events, dening

three relations:

precedes

parallels

, and

overlaps

[15]. Informally, if

and

are abstract events, then

precedes

(denoted

) i there is some dependency from an event in

to an event in

but

no dependency from an event in

to an event in

parallels

(denoted

) i there is no dep endency from an event in

to an event in

and no

dependency from an event in

to an event in

, and

overlaps

(denoted

) i there is both a dep endency from an event in

to an event in

and

a dependency from an event in

to an event in

Similar extensions have been prop osed [8, 16, 18] but we have found our denitions to be more appropriate

for debugging. In particular, our

precedes

relation captures the notion that just some part of a complex

event

happens before

some part of another event; other denitions require a total ordering which is rarely

found in the programs we have examined.

Abstract events may interact in complex ways and often the programmer wishes to focus on particular

aspects of that interaction, excluding all other aspects. Thus, for example, when debugging a program that

used a parallel queue, we were surprised to nd that the

overlaps

relation held between

Insert

and

Delete

operations on the same queue location. This interdependence was the result of logical orderings imposed

by the mechanisms used to resolve competition for queue locations; these orderings had nothing to do with

the correctness of the implementation. When we asked the debugger to ignore all orderings except those

imposed by accesses to queue locations, we were able to correctly interpret the behavior of the program [7].

Ariadne, like our previous debugger, allows the user to selectively ignore some logical orderings and thus

manipulate logical time to create dierent

perspectives

on program behavior [15].

2.3 Ariadne Queries

In the design of our language, we traded expressivity for simplicity, so we cannot always describe the intended

behaviors precisely. To compensate, Ariadne provides a set of functions that return characteristics of a match.

The user can explore a match with queries such as

Did all matching events occur in parallel?

Which processes were the destinations of a Write in this match?

Which processes matched the rst Read of the pattern?

Why did the match fail on process 12?

At this time, Ariadne provides only rudimentary feedback. Our current work is aimed at expanding its

repertoire. Although simple, Ariadne has proved eective in nding a variety of parallel program bugs. We

have found it exible in allowing the user to examine program behavior from dierent viewpoints. The next

section describes several Ariadne debugging sessions illustrating its modeling language and functionality.

3 Sample Ariadne Debugging Sessions

In this section, we illustrate the power of our modeling language, the use of logical time, and our query

facilities. We describe four sample debugging sessions using the Ariadne prototype.

3.1 Permutation

Our rst example involves a simple permutation of values, stored one per process: each process computes

its receiving process, sends its value to the appropriate destination, and reads a new value. The program we

debugged worked correctly on small processor arrays but failed on an array of 512 processes.

Summary of Debugging Session.

In examining an execution history graph from the 512 process system,

we rst tried to match the expected pattern for a permutation (that is, a Write followed by a Read on

each process). The pattern was not found and we were told that only processes with PIDs less than 256

completed their

W R

chain. In examining the behavior of the remaining pro cesses, we discovered that all

of the processes wrote their values to the lower 256 processes. This pointed us to an error in the address

calculations.

The Ariadne prototype is only partially implemented. The matching faciltities and many of the query functions have been

implemented but the syntax we use here is not yet available in the prototype. In addition, several recent modications to the

language including the use of

WRT

clauses as lters and the shue operator on the chain level have not yet been implemented.

The examples in this paper, with the exception of the triangulation session, have been run on the prototype although in a few

cases, we had to hand compute values returned by query functions.

Debugging Session.

We began by dening chain and p-chain patterns to be used in matching.

Chains are described by a slight extension to regular expressions. Our expressions, over primitive

event types, use operations of concatenation (denoted by adjacency), alternation (

), shue (

), and Kleene

Star (

For this example, we modeled the activity of a pro cess with the pattern

W R

which describes a

Write

followed by a

Read

on a single process. For future reference, we also named the

components of the pattern and the pattern itself. The complete chain denition was thus

? PermutationChain = send: W receive: R

where

send

and

receive

name the event matched by

and

respectively. The Ariadne prompt {

{ is

used in this paper to distinguish between lines of user input and debugger output. p-Chains are described

by binding a chain to a set of processes. In this case,

PermutationChain

was bound by

? Permutation = PermutationChain ONALL PROCS

to the set of all processes (denoted

PROCS

). The keyword

ONALL

indicates that a copy of the chain must

originate on each member of that set.

pt-Chains are used to describe the temporal composition of p-chains. In this example, we are looking

for a single instance of

Permutation

and thus the specication is trivial. We ask for a match with

? PermuteEvent = match Permutation

If the match had succeeded,

PermuteEvent

would have been set to the resulting abstract event. In this

case, however, it failed;

PermuteEvent

was set to

and the following feedback was given

Match failed: Search failure.

Looking for Permutation.

Found 256 chains on

0..255

; using 256 pro cesses.

This indicates that the search failed while looking for the p-chain

Permutation

and that during the search,

256 complete chains were recognized, one initiating on each of the processes numbered 0 through 255. The

last part of the message gives the number of processes that had primitive events matched by the completed

chains.

This information told us that no process ab ove 255 did both a

Write

and a

Read

. To investigate

further, we asked for additional information about the behavior of a specic pro cess using the

matchchain

command.

Matchchain

attempts to recognize a single chain on a single process and provides feedback on

the reason for a failure. In this case, it rep orted

? matchchain PermutationChain ON 256

Match failed: Chain failure.

Expecting an R but encountered the Right Cursor.

The

Right Cursor

marks the right end of the portion of the execution history graph that we are examining.

Thus this response indicated that process 256 wrote a value but did not read one. Since this did not seem

to help in locating the error, we tried another tactic.

We broke the pattern into two simple pieces, rst matching all of the

Write

s and then matching all

of the

Read

s. The dialogue went as follows:

? PWrite = W ONALL PROCS

? WriteEvents = match PWrite

Match succeeded.

Found 512 p-chains on

0..511

; using 512 pro cesses.

? PRead = R ONALL PROCS

? ReadEvents = match PRead

Match failed: Search failure.

Looking for PRead.

Found 256 chains on

0..255

; using 256 pro cesses.

Note that the shue operator is expensive to implement. We included it to model the behavior of guarded input commands;

thus in practice, we would expect that only a few items will get shued at a time. We allow its use only on the chain level.

This is more helpful. It indicates that all

Write

s occurred as expected but no pro cess above 255 received a

value. Where did the missing values go? We found out by asking

? destinations (WriteEvents)

This is our rst example of a function returning the characteristics of a match. Each such function takes

an abstract event as its argument and recursively searches the structure of that event. In the case of the

destinations

function, the structure is searched for

WriteEvents

and the set of destinations for those

events is returned. The feedback was

Values written to 256 processes:

0..255

indicating that

all

of the values written by the 512 processes were directed at the lower 256 pro cesses. Clearly

there was some problem in the address calculation code. In fact, the variable holding the destination PID

was mistakenly allocated to be an 8-bit quantity; larger values were truncated. Thus, only processes whose

identier was less than 256 could complete correctly.

If the same address truncation had occurred in a program for compression rather than permutation,

it would have b een harder to detect. A compression is very much like a permutation except that not all of

the processes write or read values. During compression, the nonzero elements of an array (stored one value

per process) are moved to the beginning of that array. Only processes with nonzero data values

Write

and

only processes at the b eginning of the array that are to receive a value

Read

. Thus, the chain denition

used above,

W R

, would not work because it is a

process-centered

chain that follows the activity of a single

process. Instead, we must use a

data-centered

pattern that follows the path of a communication. This is

described by the pattern

? CompressionCh = send: W

receive: R

The

moves the

context of the match

{ that is it changes the process and the location for matching { to

the receiving process. Thus this pattern matches a

Write

on the initiating pro cess and then follows that

communication edge to continue matching on the process that is its destination.

Not all of the processes execute the

CompressionCh

pattern (only those initially holding nonzero

values), so we can not determine

a priori

the set of processes for binding. Instead we use

? Compression = CompressionCh ONSOME PROCS

where the keyword

ONSOME

indicates that a successful match need only occur on a nonempty subset

of the given process set. Now, however, when we do the match, it succeeds despite the presence of the

truncation error:

? CompressionEvent = match Compression

Match succeeded.

Found 256 p-chains on

1,12,24..33,37,45,47,56,58..65,78,89..112,129..137,

139,141,143..149,156,158,160..189,196,197..234,241,245..258, 267,269,276,

280..298,301,314,324,356,358,367..391,413,415,433..456,470..494

;

using 362 processes.

Note that the process set is not consecutive because chains are only found on those processes that initially

have nonzero values. This result looks correct; there is no indication that some values were written but never

read. We could detect this sort of error only by asking if anything remained in the trace after the match.

(Remember that a match removes the matching events from the execution history graph.) This can be done

with

? Left = match REMAINDER

where

REMAINDER

is a predened pattern that matches anything. In our example, where the specic

trace we were using had 290 processes with nonzero PIDs, the result was

Match succeeded.

Found 34 p-chains on

15..18,79,115..126,190..194,335,495..498,501..507

;

using 34 processes.

meaning that 34 \extra" events were found that had not been matched by the

Compression

chains. What

were these events? We can nd out with

? eventtypes (Left)

34 Writes

0 Multicasts

0 Reads

0 Phase Markers

where

eventtypes

is again a function that returns a match characteristic. In this case, it counts the number

of primitive events of each type. The answer indicated that all of the unmatched events were

Writes

. We

now are in the same position that we reached in the p ermutation example: we know that only processes

below 256 completed and that some of the values that were written were never read. The

destinations

function would lead us to the error in the same manner as ab ove.

3.2 Gaussian Elimination

Our next example comes from a parallel Gaussian elimination program. The program operated on an input

matrix stored one row per process. In reducing that input to an upper triangular matrix, it executed a

number of iterations (one for each process), each beginning with the broadcast of a pivot row. The program

produced incorrect results when run on large systems of equations; we report here on an instance with 256

equations, running on a 256 pro cessor system.

Summary of Debugging Session.

In tracking down the bug, we rst determined that all 256 broadcasts

occurred and that each process performed exactly one broadcast. We then attempted to ascertain that

each broadcast logically

preceded

the next, but this turned out to be untrue, leading us to the error: the

programmer had omitted necessary barrier synchronizations between broadcasts.

Debugging Session.

We dened a broadcast chain as

? BroadcastChain = senders: M

receivers: R

In this case, because the send operation was a multicast rather than a single write, the read in the pattern

matches

all

of the

Read

s associated with that

, enabling us to uniformly handle single writes, multicasts,

and broadcasts.

The p-chain specication created a separate broadcast event for each process. Using a for-loop, we

dened an array of

Broadcast

p-chains:

? for (i=0; i

P-1; i++) do Broadcast[i] = BroadcastChain ONALL

where

is a dened constant giving the number of processes in the system. Each of these p-chains will

match a process performing a single write that is read by all other processes.

To ascertain that the correct number of broadcasts were performed, we attempted to match a set of

broadcasts with

? BCseries = match (Broadcast[])

The missing index in

Broadcast[]

indicates that any element can be used, making it a shorthand for

(Broadcast[0] + Broadcast[1] + ... + Broadcast[255])

where

means alternation in our expressions. The match was successful, resulting in

Match succeeded.

Found 256 p-chains on

0..255

; using 256 pro cesses.

Since

BCseries

matches a p-chain, it could potentially match all

occurrences on the same process, so we

have to look at the output carefully. In this case, the

Broadcast

events o ccurred on 256 processes and thus,

that each process must have initiated one. We then used

? owners (BCseries WRT

receivers

)

256 owners: on processes

0..255

owners

is again a function that determines the characteristics of a match. In this case, its argument is an

abstract event that is modied by a

WRT

clause acting as a lter.

WRT

receivers

makes only events of

type

receivers

(as dened in the chain pattern) visible to the search; thus this query determines the set of

processes executing

receives

BCseries

. It told us that every process executed a receive. We determined

the total number of receives with

? count (BCseries WRT

receivers

)

65,280 occurrences: on processes

0..255

which told us that every process read every broadcast (255



256 = 65

;

280). We now knew that the correct

number of events occurred and so we attempted to verify that the correct logical relation {

precedes

{ had

held between them.

The

Broadcast

events have already been removed from the trace because they were successfully

matched above. We could check for the

precedes

relation in two ways. We could restore the

Broadcasts

to the trace and rematch them with

? restore (BCseries)

? OrderedBCseries = match

Broadcast[] *

where the angular brackets indicate that the

precedes

must hold between matched events. Alternatively,

we could use a predicate over abstract events

? e precedes (BCseries)

In either case the relation

precedes

is checked in the matched abstract event

using its search order

; the

result is

Precedes failed.

Broadcast[38]

overlaps

Broadcast[243].

It tells us that the rst 135 broadcasts occurred correctly but the 135th

overlapped

with the 136th meaning

that

precedes

did not hold between them. This must mean that some process { either

243

{ read

the broadcast values out of order. This observation led us to the bug: a missing synchronization between

broadcasts.

3.3 Delaunay Triangulation

This example demonstrates that Ariadne can model complex behaviors. The program is a parallel version

of Bowyer's algorithm to construct a Delaunay Triangulation [4]. In Bowyer's algorithm, points are inserted

into an existing mesh one at a time; in our version, they are inserted in parallel. Each point is managed by

its own process which communicates with surrounding pro cesses looking for triangles with circumcircles

that contain its point. The triangles located in this search are \lo cked" to prevent concurrent access by

other insertions, and the p olygonal region they form is modied to add the new point. To avoid deadlock,

conicting requests for locks on triangles are resolved by ab orting one of the insertions.

Summary of Debugging Session

When our program ran, it completed but the triangles it formed did

not meet the Delaunay criteria. The sequences of insertions app eared to be correct. We hypothesized that

despite the locking mechanism, some of the insertions interfered with each other. We checked this by looking

for insertions that

overlapped

. We found three such pairs and, in examining the processes that executed

those insertions, we determined that our locking mechanism essentially lo cked the sides of triangles but not

their vertices.

Debugging Session

In the current version of Ariadne, we do not have access to the contents of a message

or its type. It is not practical to include the contents of all messages in every trace, but it is possible to use

a replay mechanism [20, 23] to acquire additional trace information. We expect to include access to such

information in future versions of our debugger. For purposes of this example, we achieved the same aect by

modifying the program so it sends dierent message types on dierent, named channels. Thus, for example,

the request for a lock is sent on p ort

req

and the response to that request is sent either on

or on

The debugger detects this use of ports. Within patterns, port names are appended to communication events

with an underscore; matched instances of these events must have the correct port names.

The prex

precedes

indicates that this is a predicate over a single abstract event, other versions of this same predicate

operate over sequences of events.

The circumcircle of a triangle is the circle that can be drawn through all of the vertices of that triangle.

For expository reasons, we model only the part of the behavior relevant to the error, that is, we model

only point insertions. These insertions begin with some number of attempts to lock relevant triangles:

the

initiating process sends a multicast requesting the relevant locks on

req

, the recipients respond on either

, and the initiating process collects the responses. A simple form of this pattern might be

req

R (W

R + W

where the control of matching begins on the initiating process with the multicast and splits at the rst

proceed independently on each of the receiving processes.

This simple expression, however, is not sucient because we must also model subsequent behavior

on the initiating process. In the case of an unsuccessful attempt, the initiating process subsequently sends

an \abort" message and then retries the lock attempt; in the case of a successful attempt, it subsequently

attempts to get all relevant triangles to commit to the update. Thus, we must return the control of matching

to the initiating process. We indicate this by marking the point of the split with the symbol

(replacing

the

) and the point of the return with the symbol

. Thus a lock attempt is dened as

LockAttempt = M

req

R (W

R + W

Similarly, we dene an abort, an unsuccessful attempt to commit, and a successful attempt to commit

Abort = M

abort

CommitNo = M

com

R W

committed

CommitYes = M

com

R start:W

committed

where the

CommitYes

includes a tag on events that essentially marks the beginning of the critical region

for the insert. The initial, unsuccessful attempts are matched by

Unsuccessful = ( (LockAttempt Abort) * LockAttempt CommitNo Abort) *

and the ultimately successful attempt by

Successful = (LockAttempt Abort) * Lo ckAttempt CommitYes

Once the locking attempt succeeds, the initiating pro cess performs the actual insertion of its point by sending

a multicast on port

add

and waiting for acknowledgments on port

done

. The pattern is

Addition = M

add

R end:W

done

where the tag

end

is used to mark the end of the critical region for this insert. The entire chain and the

needed p-chains are as follows

? Insert = Unsuccessful Successful Addition

? for (i=0; i

P-1; i++) do AddPoint[i] = Insert ONALL

{

}

We successfully matched the expected behavior with the command:

? Triangulation = match (AddPoint[]) *

This led us to conclude that all of the needed transactions had occurred. We hypothesized that there must

have been some interference between insertions. To check this, we used the following query, looking just at

the \successful" p ortion of the matched additions.

? e

non

overlaps(Triangulation WRT

start,end

)

Assertion Failed.

AddPoint[17] overlaps AddPoint[13]

AddPoint[55] overlaps AddPoint[54]

AddPoint[100] overlaps AddPoint[98]

The feedback on the failure of this assertion led us to investigate the pairs of points that had overlapping

insertions. We discovered that processes in each pair shared common triangle vertices. This led us to an

error in our locking mechanism: in eect, we were locking the sides of the triangle but not their vertices.

Ariadne was designed as a testbed for investigating the utility and limitations of various types of match

feedback. The above examples demonstrate successful uses of its current features. In the next section, we

give an example of a program for which it was not successful.

Processes recompute the set of relevant triangles immediately before each lock attempt but that b ehavior is not modeled

here.

4 The Limitations of Textual Feedback

In this example, we consider a program that implements a dictionary search in which queries are pipelined

from a host to a database of key-ordered records stored in a hypercube. Queries are routed within the cube

to the proper node using binary search. More than one query is active at a time. The program as written

contained a routing error.

We consider an 8 processor cube with processes having PIDs 0 through 7 and a host process with PID

8. We model the behavior of the program as a series of queries, each query starting at the host, traversing

the cube and eventually returning to the host. The chain query uses two features we have not encountered

thus far: the denition of a set of processes (

Cube

) and the limitation of a communication event to a set of

processes (denoted by

followed by a process set).

? Cub e =

0..7

? QueryChain = W#

R#Cube ( W#Cube

R#Cube ) * W

? Query = QueryChain ON

? match Query *

Match Succeeded.

Found 2 p-chains on

; using 8 pro cesses.

The match succeeds but it does not give us any information about the error. Further investigations using

Ariadne did not help. We had better success in debugging this program with our previous animating

debugger, Belvedere [14].

In using Belvedere, we also dened an abstract event that matched the entire set of messages associated

with a query; the query itself was much more complex (Belvedere uses the EDL modeling language [3]).

Initially, the animation was incomprehensible, as shown in Figure 1

because the

Query

events overlapped

in logical time: each query follows data-dependent paths through the cube, arriving in dierent orders

at dierent processes. To separate the events, we created a perspective on the animation that included

only dependencies caused by

Write

events on the host process (this is the same functionality provided by

Ariadne's

WRT

clauses). Two snapshots from these perspective views are shown in Figure 1

. They

portray the same execution trace that we used above with Ariadne. Now, however, the erroneous behavior

is easy to spot: in Figure 1

, a query crosses a dimension of the cub e twice.

As the programmers of this code, we knew that message transmissions should follow the path of a

binary search. Once half of the remaining cube is eliminated by a comparison, the search should never go

back to that subtree by crossing the same dimension of the cube again. Investigations of this behavior, led

us to discover a routing error in the initial calculations of the return path for a query.

(c)

(b)

(a)

Figure 1: Snapshots from the an animation of the Dictionary Search. Concurrent abstract events (

); a

perspective view of an abstract events showing the path taken by an individual request (

); and a perspective

view of a second query showing an extra communication from the front to the back plane of the cube (

The routing error was immediately apparent from the animation but we could not nd it with Ariadne.

It is not possible to concisely describe a query that nds this anomaly; worse, it is unlikely that the pro-

grammer would even think to ask such a query. The anomaly was detected as a deviation in a visual pattern.

This example serves as an indicator that we will not b e able to completely avoid graphical output. In an

independent eort, we are developing scalable graphical representations of massively parallel computations

and eventually, we expect to combine the two eorts.

5 Conclusion

We have introduced a new approach to the application of event-based abstraction to massively parallel

computing. Previous methods were limited by their modeling languages: Suciently expressive languages

required very complex matching algorithms that admitted only very limited feedback on the extent of a

match. In some cases, the feedback was graphically presented in ways that did not scale to massively parallel

systems. Our approach uses a simple modeling language that describ es global patterns of communication in

terms of parallel compositions of local patterns. This pro duces concise, scalable denitions and it allows for

more informative feedback. We compensate for the loss of expressivity by allowing the user to interactively

explore the extent to which a model matches the execution trace. We do not rely on graphical renderings

and thus our techniques work well for even mo derately large numbers of pro cesses. We have implemented a

prototype called Ariadne and have illustrated the eectiveness of this approach by presenting sample Ariadne

debugging sessions involving actual parallel programs.

Ariadne was designed as a testbed for exploring the scalable application of event-based behavioral

abstraction. We are currently evaluating the expressivity of its language and functional queries. In addition,

because programmers are reluctant to learn new modeling languages for the sake of debugging, we are

considering graphical languages that might make the description of patterns less onerous. We are also

designing techniques for producing graphical displays of program b ehavior that would scale well. Finally,

because Ariadne will eventually have to be intergrated into a more complete debugging system, we are

investigating extensions to aspects of program behavior other than communication.

6 Acknowledgements

We thank a number of people for their contributions to this work. The Ariadne Development Team designed

and implemented the prototype: Ruth Anderson, Sung-Eun Choi, Jerey Dean, Donald A. Lobo, Ton Anh

Ngo, and W. Derrick Weathersby. Lee Delaney and Patrick Donohue tracked down some of its lingering

bugs. Bruce Leban commented on earlier versions of the pap er.

References

[1] G. Alverson, W. Griswold, D. Notkin and L. Snyder. A exible communication abstraction for nonshared

memory parallel computing.

Proceedings of Supercomputing '90

, 1990.

[2] F. Baiardi, N. De Fransesco and G. Vaglini. Development of a debugger for a concurrent language. In

IEEE Transactions on Software Engineering

, SE-12(4):547{553, Apr. 1986.

[3] P. C. Bates.

Debugging Programs in a Distributed System Environment

. PhD thesis, University of

Massachusetts, Amherst, MA 01003, 1986. Also COINS Technical Report 86{05.

[4] A. Bowyer. Computing Dirichlet Tesselations. The Computer Journal, 24(2), pages 162{166, Feb. 1981.

[5] B. Bruegge and P. Hibbard. Generalized path expressions: A high level debugging mechanism. In

Pro-

ceedings of the ACM SIGSOFT/SIGPLAN Software Engineering Symposium in High-Level Debugging

pages 34-44, 1983.

[6] R. Cooper and K. Marzullo. Consistent detection of global predicates. In

Proceedings of the ACM/ONR

Workshop on Parallel and Distributed Debugging

, pages 167{174, 1991.

[7] J. E. Cuny, A. Hough, and J. Kundu. Logical time in visualizations pro duced by parallel programs.

Proceedings of Visualization '92

, pages 186{193 (1992).

[8] C. J. Fidge. Partial orders for parallel debugging.

SIGPLAN Notices

, 24(1), pages 183{194, 1989.

[9] R. J. Fowler, T. J. Leblanc, and J. M. Mellor-Crummey. An integrated approach to parallel program

debugging and performance analysis on large-scale multiprocessors.

SIGPLAN Notices

, 24(1), pages

163{173, 1989.

[10] G. S. Goldszmidt, S. Katz, and S. Yemini. High level language for debugging concurrent programs.

ACM Transactions on Computer Systems

, 8(4), pages 311{336, Nov. 1990.

[11] P. K. Harter, D. M. Heimbigner and R. King. IDD: an interactive distributed debugger. In

Proceedings

of the 5th International Conference on Distributed Computing Systems

, pages 498{506, 1985.

[12] M. Heath and J. Etheridge. Visualizing the p erformance of parallel programs.

IEEE Software

, 8(5):29{

39, 1991.

[13] D. Hembold and D. Luckham. Debugging Ada tasking programs.

IEEE Software

, 2(2), pages 47-57,

Mar. 1985.

[14] A. A. Hough.

Debugging Parallel Programs Using Abstract Visualizations

. PhD thesis, University of

Massachusetts, Amherst, MA 01003, 1991. Also COINS Technical Report 91{53.

[15] A. A. Hough and J. E. Cuny. Perspective views: A technique for enchancing visualizations of parallel

programs. In

1990 International Conference on Parallel Processing

, pages II 124{132, Aug. 1990.

[16] W. Hseush and G. E. Kaiser. Modeling concurrency in parallel debugging. In

Proceedings of the Second

ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

, pages 11{20, March

1990.

[17] L. Lamport. Time, clocks, and the ordering of events in a distributed system.

Communications of the

ACM

, 21(7):558{565, 1978.

[18] L. Lamport. The mutual exclusion problem: Part I-A theory of interprocess communication.

Journal

of the Association for Computing Machinery

, 33(2):313-326, April 1986.

[19] R. J. LeBlanc and A. D. Robbins. Event-driven monitoring of distributed programs. In

Proceedings of

the 5th International Conference on Distributed Computing Systems

, pages 515{522, 1985.

[20] T. J. LeBlanc and J. M. Mellor-Crummey. Debugging parallel programs with instant replay.

IEEE

Transactions on Computers

, C-36(4):471{482, Apr. 1987.

[21] T. J. LeBlanc, J. M. Mellor-Crummey, and R. J. Fowler. Analyzing parallel program executions using

multiple views.

Journal of Parallel and Distributed Computing

, 9:203{217, 1990.

[22] V. M. Lo, S. Ra jopadhye, M. A. Mohamed, S. Gupta, B. Nitzberg, J. A. Telle, X. X. Zhong. LaRCS: A

language for describing parallel computations for the purpose of mapping. Technical Report CIS-TR-

90-16, University of Oregon Dept. of Computer Science, 1990.

[23] B. Miller and J.-D. Choi. A mechanism for ecient debugging of parallel programs.

SIGPLAN Notices

24(1), pages 141{150, 1989.

[24] L. Snyder. The XYZ abstraction levels of Poker-like languages.

Languages and Compilers for Parallel

Computing

, David Gelernter and Alexandru Nicolau and David Padua(eds.), MIT Press, pages 470{489,

1990.

Combing the Communication Hairball: Visualizing Large-Scale Parallel Execution Traces using Logical Time

Article

Full-text available

Dec 2014

With the continuous rise in complexity of modern supercomputers, optimizing the performance of large-scale parallel programs is becoming increasingly challenging. Simultaneously, the growth in scale magnifies the impact of even minor inefficiencies – potentially millions of compute hours and megawatts in power consumption can be wasted on avoidable mistakes or sub-optimal algorithms. This makes performance analysis and optimization critical elements in the software development process. One of the most common forms of performance analysis is to study execution traces, which record a history of per-process events and interprocess messages in a parallel application. Trace visualizations allow users to browse this event history and search for insights into the observed performance behavior. However, current visualizations are difficult to understand even for small process counts and do not scale gracefully beyond a few hundred processes. Organizing events in time leads to a virtually unintelligible conglomerate of interleaved events and moderately high process counts overtax even the largest display. As an alternative, we present a new trace visualization approach based on transforming the event history into logical time inferred directly from happened-before relationships. This emphasizes the code's structural behavior, which is much more familiar to the application developer. The original timing data, or other information, is then encoded through color, leading to a more intuitive visualization. Furthermore, we use the discrete nature of logical timelines to cluster processes according to their local behavior leading to a scalable visualization of even long traces on large process counts. We demonstrate our system using two case studies on large-scale parallel codes.

Discovering models of behavior for concurrent workflows

Article

Mar 2004
COMPUT IND

Understanding the dynamic behavior of a workflow is crucial for being able to modify, maintain, and improve it. A particularly difficult aspect of some behavior is concurrency. Automated techniques which seek to mine workflow data logs to discover information about the workflows must be able to handle the concurrency that manifests itself in the workflow executions. This paper presents techniques to discover patterns of concurrent behavior from traces of workflow events. The techniques are based on a probabilistic analysis of the event traces. Using metrics for the number, frequency, and regularity of event occurrences, a determination is made of the likely concurrent behavior being manifested by the system. Discovering this behavior can help a workflow designer better understand and improve the work processes they are managing.

Visualizing Program Behavior with the Event Graph

Chapter

Jan 2003

Dieter Kranzlmüller

As more and more computer systems “invade” our daily life, the rapidly growing field of software engineering needs to address ever more and new areas for analysis, design, implementation, and maintenance of software. At the same time, existing software is extended by changing requirements, such as the addition of new features, the increasing demand of interoperability with other systems, or the partial or complete redesign for altered application areas. This imposes a heavy burden on the human software engineer, who has to cope with the often changing and challenging software development processes while focusing on the solution of a particular goal. For this reason, many past and on-going projects attempt to support the user during all aspects of the software lifecycle.

High-Level Debugging and Exploration

Chapter

May 2010

Up to now, the designer has checked code quality and particular functional correctness aspects of the system model using static analysis. Now, the first executable version of the model can be compiled and simulated. This version may only comprise a subsystem of the final system. In place of static tests dynamic analysis in terms of observation techniques support the designer in debugging the simulatable design description. If the simulation produces an erroneous outcome the simulation state is observed at interesting moments in time. Therefore, a debugging and exploration approach at a higher abstraction level is proposed (see Figure 4.1).

An Extensible, Retargetable Debugger for Parallel Programs

Article

Jan 1994

John Michael May

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : ix 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1. Background : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 2. Problems in developing parallel debuggers : : : : : : : : : : : : : : : : 3 1. Quantity of state information : : : : : : : : : : : : : : : : : : : : : 3 2. Nondeterminism : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 3. Architectural variation : : : : : : : : : : : : : : : : : : : : : : : : : 5 3. Related work : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 1. Program replay : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2. Visualization : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 3. Extensibility : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 11 4. Portability : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13 4. Goals of this research : : : : : : : : : : : : : ...

State of the art of performance visualization

Conference Paper

Full-text available

Jun 2014

Performance visualization comprises techniques that aid developers and analysts in improving the time and energy efficiency of their software. In this work, we discuss performance as it relates to visualization and survey existing approaches in performance visualization. We present an overview of what types of performance data can be collected and a categorization of the types of goals that performance visualization techniques can address. We develop a taxonomy for the contexts in which different performance visualizations reside and describe the state of the art research pertaining to each. Finally, we discuss unaddressed and future challenges in performance visualization.

Integrating event-and state-based approaches to the debugging of parallel programs

Article

Jan 1996

Joydip Kundu

Event-Based Techniques to Debug an Object Request Broker

Article

Mar 1999

This work presents a debugging system built for the Object Request Broker (ORB) used in the construction of Solaris MC, a multicomputer OS. Even though it has been built and tested on a particular ORB, we believe similar ideas could be employed on other ORBs with similar structure and goals. The goal of this system is to provide a means to stress the ORB behavior in a controlled manner while logging the events occurred during its execution. The tool, called the Fault Injection and Event Logging Tool (FIELT) helps system programmers to find possible inconsistencies in the code by means of a post-mortem analysis of the collected trace data. The approach taken to design the event logging follows the event-driven techniques to monitorize distributed systems. Failures in the ORB are injected by software instrumentation and these injected failures are considered as special events. This allows us to reason about the correctness of the ORB in a broad sense, where its expected behavior includes to gracefully cope with failures. The number of potentially relevant events produced during the ORB execution is unmanageably high. There is, thus, a need to find a minimum subset of those events which, without losing relevant system behavior, allows us to infer its correctness (or lack of). We address this problem using a new model for ORB computations, assigning each event produced by the ORB to one of the high level objects it manages.

Event graph analysis for debugging massively parallel programs

Article

Event-Base Detection of Concurrency.

Conference Paper

Nov 1998

Understanding the behavior of a system is crucial in being able to modify, maintain, and improve the system. A particularly difficult aspect of some system behaviors is concurrency. While there are many techniques to specify intended concurrent behavior, there are few, if any, techniques to capture and model actual concurrent behavior. This paper presents a technique to discover patterns of concurrent behavior from traces of system events. The technique is based on a probabilistic analysis of the event traces. Using metrics for the number, frequency, and regularity of event occurrences, a determination is made of the likely concurrent behavior being manifested by the system. The technique is useful in a wide variety of software engineering tasks, including architecture discovery, reengineering, user interaction modeling, and software process improvement.

A Mechanism for Efficient Debugging of Parallel Programs

Article

Full-text available

Jan 1989

This paper addresses the design and implementation of an integrated debugging system for parallel programs running on shared memory multi-processors (SMMP). We describe the use of flowback analysis to provide information on causal relationships between events in a program's execution without re-executing the program for debugging. We introduce a mechanism called incremental tracing that, by using semantic analyses of the debugged program, makes the flowback analysis practical with only a small amount of trace generated during execution. We extend flowback analysis to apply to parallel programs and describe a method to detect race conditions in the interactions of the co-operating processes.

Development of a debugger for a concurrent language

Article

Full-text available

Apr 1986

This work deals with some issues concerned in the debugging of concurrent programs. A set of desirable characteristics for a debugger for concurrent languages is deduced from a review of the differences between the debugging of concurrent programs and that of sequential ones. A debugger for a concurrent language, based upon CSP, is then described. The debugger makes it possible to compare a description of the expected program behavior to the actual behavior. The description of the behavior is given in terms of expressions composed by events and/or assertions on the process state. The developed formalism is able to describe behaviors at various levels of abstraction. Lastly, some guidelines for the implementation of the debugger are given and a detailed example of program debugging is analyzed.

Development of a debugger for a concurrent language

Article

Aug 1983

This work discusses some issues in the debugging of concurrent programs. A set of desirable characteristics of a debugger for concurrent languages is deduced from an examination of the differences between the debugging of concurrent programs and that of sequential ones. A debugger for a concurrent language, derived from CSP, is then presented. It is based upon a semantic model of the supported language. The debugger enables to compare a description of the program behaviour to the actual behaviour as well as to valuate assertions on the process state. The description of the behaviuor is given by a formalism whose semantics is also specified. The formalism can specify program behaviuors at various abstraction levels. Lastly some guidelines for the implementation of the debugger are shown and a detailed example of program description is analyzed.

Generalized Path Expressions: A High Level Debugging Mechanism

Article

Aug 1983

This paper introduces a modified version of path expressions called Path Rules which can be used as a debugging mechanism to monitor the dynamic behaviour of a computation. Path rules have been implemented in a remote symbolic debugger running on the Three Rivers Computer Corporation PERQ computer under the Accent operating system.

Generalized path expressions

Conference Paper

Aug 1983
Software Eng Notes

Debugging ads tasking programs

Article

Jan 1985

The xyz abstraction levels of poke~-like languages

Article

May 1990

L. Snyder

Perspective views: a technique for enchancing visuafizations of parallel programs

Article

Debugging programs in a distributed system environment

Thesis

Jan 1986

P.C. Bates

Debugging is an activity that attempts to locate the sources of errors in the specification and coding of a software system and to suggest possible repairs that might be made to correct the errors. Debugging complex distributed programs is a frustrating and difficult task. This is due primarily to the predominance of a low level, computation-unit view of systems. This extant perspective is necessarily detail intensive and offers little aid in dealing with the higher level operational characteristics of a system or the complexities inherent in distributed systems. This dissertation develops a high level debugging approach in which debugging is viewed as a process of creating models of actual behavior from the activity of the system and comparing these to models of expected system behavior. The differences between the actual and expected models can be used to characterize errorful behavior. The basis for the approach is viewing the activity of a system as consisting of a stream of significant, distinguishable events that may be abstracted into high-level models of system behavior. An example is presented to demonstrate the use of event based model building to investigate an error in a distributed program.

Modeling Concurrency in Parallel Debugging

Article

Mar 1990

We propose a debugging language, Data Path Expressions (DPEs), for modeling the behavior of parallel programs. The debugging paradigm is for the programmer to describe the expected program behavior and for the debugger to compare the actual program behavior during execution to detect program errors. We classify DPEs into five subclasses according to syntactic criteria, and characterize their semantics in terms of a hierarchy of extended Petri Net models. The characterization demonstrates the power of DPEs for modeling parallelism. We present predecessor automata as a mechanism for implementing the third subclass of DPEs, which expresses bounded parallelism. Predecessor automata extend finite state automata to provide efficient event recognizers for parallel debugging. We briefly describe the application of DPEs to race conditions, deadlock and starvation.

The Ariadne Debugger: Scalable Application of Event-Based Abstraction.

Recommended publications

The Ariadne debugger

Logical Time in Visualizations Produced by Parallel Programs.

Perspective Views: A Technique for Enhancing Parallel Program Visualization.

Initial Experiences with a Pattern-Oriented Parallel Debugger.