Article

CODE: A Unified Approach to Parallel Programming

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The authors describe CODE (computation-oriented display environment), which can be used to develop modular parallel programs graphically in an environment built around fill-in templates. It also lets programs written in any sequential language be incorporated into parallel programs targeted for any parallel architecture. Broad expressive power was obtained in CODE by including abstractions of all the dependency types that occur in the widely used parallel-computation models and by keeping the form used to specify firing rules general. The CODE programming language is a version of generalized dependency graphs designed to encode the unified parallel-computation model. A simple example is used to illustrate the abstraction level in specifying dependencies and how they are separated from the computation-unit specification. The most important CODE concepts are described by developing a declarative, hierarchical program with complex firing rules and multiple dependency types.< >

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Few researchers 13 Java is a concurrent, class-based, object-oriented computer programming language 14 C++ is a programming language standardized by the International Organisation for Standardization (ISO) 15 Logic programming is a type of programming paradigm which is largely based on formal logic have combined Artificial Intelligence with a compiler to automate the process. It has been summarised in the research paper "Artificial Intelligence meets Compilers" where human or machine write programme (execute) based on knowledge of algorithms, data structures, design patterns etc. (Browne et al., 1989;Weiss and Gerhard, 1999). (Browne et al., 1989) Programming by Example projects would use AI mechanism to accelerate computing by creating an automatic database query. ...
... It has been summarised in the research paper "Artificial Intelligence meets Compilers" where human or machine write programme (execute) based on knowledge of algorithms, data structures, design patterns etc. (Browne et al., 1989;Weiss and Gerhard, 1999). (Browne et al., 1989) Programming by Example projects would use AI mechanism to accelerate computing by creating an automatic database query. AI techniques should be initiated to represent, find, and substantiate the design patterns to automate the programme. ...
... The belief that '…programming hasn't changed in 30 years…' (Browne et al., 1989) can be challenged. Power-based Strategy for AI uses speed and power of the computer to derive answers to problems by search, starting from a small number of principles. ...
Article
Full-text available
... The observed orderings, therefore, indicate those dependences of actions that "caused" the events to be ordered. So, event orderings of Fig. 2(a) indicate that they were caused by the dependences of Fig. 2(b): Ordering p 1 < w 3 was caused by the dependence (p, w), ordering q 1 < w 2 was caused by the dependence (q, w), and w 1 < w 2 (w, w). ...
... The write-read dependence on a shared synchronization variable, or on a message, forces the actions to execute in a particular order. Our motivation for representing the synchronization dependences as data-flow dependences comes from the language independence and machine independence goals of the CODE graphical programming environment [2], [15]. The data flow characterization of the synchronization and controlflow dependences in CODE allow the environment to support shared memory, as well as, distributed systems. ...
... The representation of the state of a data-flow dependence in § 2.0 by a string of values (or an infinite FIFO buffer) takes into account this dependence. Moreover, it allows us to model the general cases of the send and receive of messages in distributed systems, and the data-flow dependences of the graphical/visual languages like CODE [2], [15]. But, the representation may create problems in modeling the synchronization primitives of shared memory systems. ...
... The template-based approach to parallel programming was used in the late 1980's in sys-tems like CODE [6] and FrameWorks [27]. Some recent systems based on similar techniques include CODE2 [7], Enterprise [25], HeNCE [7], PUL-TUF [31], and Tracs [4]. ...
... Its significance was recognized and explored by a number of researchers and system designers. A number of parallel programming systems support such structures [2,4,6,7,[10][11][12][25][26][27]. All these systems employ the idea of separation of specifications (also refer to section 1). ...
Article
Parallel programming is complicated. This complexity arises from the compound-ing of low-level parallelism related issues with the problems of writing good sequential code. Over the years, various approaches have been proposed to aid parallel program developers. These approaches employ high-level models of parallel computation, thus hiding the low-level parallelism-related details from the user. Different approaches em-ploy different abstraction techniques, such as communication libraries, macros, new parallel languages and abstract data types. In this paper, we present a template-based approach to parallel application development, which uses frequently occurring patterns for parallelism. A parallel template is a re-usable, application-independent encapsulation of a commonly used parallel computing pattern. It is implemented as a re-usable code-skeleton for quick and reliable development of parallel applications. In the past, parallel programming systems have allowed fast prototyping of parallel applications based on commonly occurring communication and synchronization struc-tures. The uniqueness of this approach is that the templates in this model are generic, with associated structural and behavioral attributes which can be parameterized. Tem-plates have standard interfaces which facilitate their composition. Unlike the similar approaches in the past, which were mostly suitable for solving a limited subset of parallel applications, this approach provides a systematic development model for the hierarchical development and the subsequent refinements of a vast majority of coarse-grained parallel applications, which can be suitably solved on a network cluster. Two of the main issues addressed are: degree of flexibility in application development and extendibility (hence adaptability) of the development system as per user's need. Both of these issues were some of the major concerns in the past.
... Many of the systems that employ a separation of specications and code are based on the data-ow model. Example systems are CODE [10], DGL [22], LGDF [15] and Paralex [3]. Some of these models also provide hierarchical resolution of parallelism [10,24]; others don't [3,15,22]. ...
... Example systems are CODE [10], DGL [22], LGDF [15] and Paralex [3]. Some of these models also provide hierarchical resolution of parallelism [10,24]; others don't [3,15,22]. ...
Conference Paper
Full-text available
For almost a decade we have been working at developing and using template-based models for coarse-grained parallel computing. Our initial system, FrameWorks, was positively received but had a number of shortcomings. The Enterprise parallel programming environment evolved out of this work, and now, after several years of experience with the system, its shortcomings are becoming evident. This paper outlines our experiences in developing and using the two parallel programming systems. Many of our observations are relevant to other parallel programming systems, even though they may be based on different assumptions. Although template-base models have the potential for simplifying the complexities of parallel programming, they have yet to realize these expectations for high-performance applications.
... Starting with the late 80s, several pattern-based systems have been built with the intention to facilitate the rapid development of parallel applications through the use of pre-implemented, reusable components. Some of the earlier systems include Code [4] and Frameworks [28]. Some of the recent systems based on similar ideas are: Enterprise [26], Code2 [5], HeNCE [5], Tracs [2], and DPnDP [30]. ...
... These are some of the other essential features that are lacking in most of the existing pattern-based approaches to parallel computing. In the past, several parallel programming systems have supported frequently used parallel interactions [4, 5, 26, 28]. However, in all these cases a fixed number of high-level parallel interactions have been hard-coded into the system. ...
Article
The concept of design patterns has been extensively studied and applied in the context of object-oriented software design. Similar ideas are being explored in other areas of computing as well. Over the past several years, researchers have been experimenting with the feasibility of employing design-patterns related concepts in the parallel computing domain. In the past, several pattern-based systems have been developed with the intention to facilitate faster parallel application development through the use of pre-implemented and reusable components that are based on frequently used parallel computing design patterns. However, most of these systems face several serious limitations such as limited flexibility, zero extensibility, and ad hoc nature of their components. Lack of flexibility in a parallel programming system limits a programmer to using only the high-level components provided by the system. Lack of extensibility here refers to the fact that most of the existing pattern- based parallel programming systems come with a set of pre-built patterns integrated into the system. However, the system provides no obvious way of increasing the repertoire of patterns when need arises. Also, most of these systems do not offer any generic view of a parallel computing pattern, a fact which may be at the root of several of their shortcomings. This research proposes a generic (i.e., pattern- and application-independent) model for realizing and using parallel design patterns. The term "Parallel Architectural Skeleton" is used to represent the set of generic attributes associated with a pattern. The Parallel Architectural Skeleton Model (PASM) is based on the message-passing paradigm, which makes it suitable for a LAN of workstations and PCs. The model is flexible as it allows the intermixing of high- level patterns with low-level message-passing primitives. An object-oriented and library-based implementation of the model has been completed using C++ and MPI, without necessitating any language extension. The generic model and the library-based implementation allow new patterns to be defined and included into the system. The skeleton-library serves as a "framework" for the systematic, hierarchical development of network-oriented parallel applications.
... The CODE, CODE 2.0, and Hence languages [28][29][30] are based on graph structures, where nodes define simple operations and the edges represent the order of their execution. Such a visual approach allows for a better representation of the concurrency of computations. ...
Article
Full-text available
Distributed, large-scale computing is typically performed using textual general-purpose programming languages. This requires significant programming skills associated with the parallelisation and distribution of computations. In this paper, we present a visual (graphical) programming language called the Computation Application Language (CAL) to raise abstraction in distributed computing. CAL programs define computation workflows by visualising data flowing between computation units. The goal is to reduce the amount of traditional code needed and thus facilitate development even by non-professional programmers. The language follows the low-code paradigm, i.e. its implementation (the editor and the runtime system) is available online. We formalise the language by defining its syntax using a metamodel and specifying its semantics using a two-step approach. We define a translation of CAL into an intermediate language which is then defined using an operational approach. This formalisation was used to develop a programming and execution environment. The environment orchestrates computations by interpreting the intermediate language and managing the instantiation of computation modules using data tokens. We also present an explanatory case-study example that shows a practical application of the language.
... Data flow languages generally operate at the level of fundamental operations rather than at a functional granularity. One exception is CODE 2, which permits incorporation of sequential code into a dynamic flow graph, but restricts shared state to a special node type [Browne et al. 1989;Browne et al. 2000]. Data flow languages also typically prohibit global state. ...
Article
Programming high-performance server applications is challenging: it is both complicated and error-prone to write the concurrent code required to deliver high performance and scalability. Server performance bottlenecks are difficult to identify and correct. Finally, it is difficult to predict server performance prior to deployment. This paper presents Flux, a language that dramatically simplifies the construction of scalable high-performance server applications. Flux lets programmers compose off-the-shelf, sequential C, C++, or Java functions into concurrent servers. The Flux compiler type-checks programs and guarantees that they are deadlock-free. We have built a number of servers in Flux, including a web server with PHP support, an image-rendering server, a BitTorrent peer, and a game server. These Flux servers perform comparably to their counterparts written entirely in C. By tracking hot paths through a running server, Flux simplifies the identification of performance bottlenecks. The Flux compiler also automatically generates discrete event simulators that accurately predict actual server performance under load and with different hardware resources.
... A compiler has many filters (lexical analyses, parsing, sematic and code generation) through which our program passes and we get final machine code after this. Other well-known examples of pipe and filter style are programming in Unix shell [1], signal processing domain [2], parallel programming [3], functional programming [4] and distributed systems. ...
... There have been a few modeling efforts in the parallel programming domain. The CODE [10] programming language is based on a generalized dependency graph to express the computation in a unified parallel computation model without any implementation details. GASPARD [16] is another visual parallel programming environment supporting task and data parallelism. ...
Article
Full-text available
As the computation power in desktops advances, parallel programming has emerged as one of the essential skills needed by next generation software engineers. However, programs written in popular parallel programming paradigms have a substantial amount of sequential code mixed with the parallel code. Several such versions supporting different platforms are necessary to find the optimum version of the program for the available resources and problem size. As revealed by our study on benchmark programs, sequential code is often duplicated in these versions. This can affect code comprehensibility and re-usability of the software. In this paper, we discuss a framework named PPModel, which is designed and implemented to free programmers from these scenarios. Using PPModel, a programmer can separate parallel blocks in a program, map these blocks to various platforms, and re-execute the entire program. We provide a graphical modeling tool (PPModel) intended for Eclipse users and a Domain-Specific Language (tPPModel) for non-Eclipse users to facilitate the separation, the mapping, and the re-execution. This is illustrated with a case study from a benchmark program, which involves re-targeting a parallel block to CUDA and another parallel block to OpenMP. The modified program gave almost 5× performance gain compared to the sequential counterpart, and 1.5× gain compared to the existing OpenMP version.
... We feel that graph hierarchies are very useful for structuring large graphs. A somewhat different graph-representation is used in CODE/ROPE [9, 8]. The user can specify dependcies between program components using dependency graphs. ...
... The skeleton-based approach to parallel programming is not something new. It was used in the 1980's in systems like CODE [5] [6] and FrameWorks [25] [24]. Some recent systems based on skeletons and similar techniques include CODE2 [6], Enterprise [22], HeNCE [6], PUL-TUF [28], TRAC [3] and DPnDP [27] [26]. ...
Article
One of the greatest obstacles to the mainstream adoption of parallel computing is its complexity. Over the years, various approaches have been proposed to aid parallel program developers. Most of these ap-proaches employ a high-level model of parallel computation, thus hiding the low-level parallelism-related details from the user. Different models employ different abstraction techniques, such as communication libraries, macros, new parallel languages and abstract data types. In this paper we present a skeleton-based approach which uses frequently occurring structures for parallelism, and is a hybrid of high-and low-level models. Each skeleton is a re-usable, application-independent component providing a com-monly used parallel structure. A number of such skeletons can be combined together to create the skeleton of the entire application, which can then be filled in with the application specific components. Unlike other skeleton-based approaches in the past, this work is unique in the following aspects: First, it gives a generic definition to a skeleton, with associated structural and behavioral components. The crucial behavioral components were missing in the related works of the past. Second, it gives a clear-cut and natural model to compose the individual skeletons to develop the entire parallel application. As a result, it is easy for the user to compose skeletons correctly. Third, unlike the previous approaches, the user can work at various levels of abstraction and also intermix them. For instance, the user can intermix skeletons with the lowest level of communication primitives available to him. This gives him a high degree of flexibility in developing his application. Fourth, a library-based approach, together with a generic definition of a skeleton, makes it a highly extendible approach, i.e. a new skeleton can be added to the system as per need. Some recent approaches, which intended to be extendible, were in fact hardly extendible due to the absence of a generic viewpoint of a skeleton. As a direct realization of the model, a 1 library-based development system using object-oriented design methodologies in C++ and the standard Message-Passing Interface (MPI) has been implemented. The latter part of the paper focuses on the implementation and presents experimental results obtained on a cluster of workstations.
... Related Work The general concerns which led to the design of MANIFOLD are not new. The CODE system45,46] ...
Article
Full-text available
Management of the communications among a set of concurrent processes arises in many applications and is a central concern in parallel computing. In this paper we introduce MANIFOLD: a co-ordination language whose sole purpose is to describe and manage complex interconnections among independent, concurrent processes. In the underlying paradigm of this language the primary concern is not with what functionality the individual processes in a parallel system provide. Instead, the emphasis is on how these processes are interconnected and how their interaction patterns change during the execution life of the system. This paper also includes an overview of our implementation of MANIFOLD. As an example of the application of MANIFOLD, we present a series of small manifold programs which describe the skeletons of some adaptive recursive algorithms that are of particular interest in computer graphics. Our concern in this paper is to show the expressiveness of MANIFOLD, the feasibility of its implementation and its usefulness in practice. Issues regarding performance and optimization are beyond the scope of this paper.
... Examples of systems that use graphical notations to express parallel computations include Fel [30], Poker [37], CODE [14], Alex [32], LGDF [6] and apE [22]. None of these systems addresses fault tolerance nor provides a programming environment in the sense of Paralex. ...
Conference Paper
Full-text available
Modern distributed systems consisting of powerful workstations and high-speed interconnection networks are an economical alternative to special-purpose super computers. The technical issues that need to be addressed in exploiting the parallelism inherent in a distributed system include heterogeneity, high-latency communication, fault tolerance and dynamic load balancing. Current software systems for parallel programming provide little or no automatic support towards these issues and require users to be experts in fault-tolerant distributed computing. The Paralex system is aimed at exploring the extent to which the parallel application programmer can be liberated from the complexities of distributed systems. Paralex is a complete programming environment and makes extensive use of graphics to define, edit, execute and debug parallel scientific applications. All of the necessary code for distributing the computation across a network and replicating it to achieve fault tolerance and dynamic load balancing is automatically generated by the system. In this paper we give an overview of Paralex and present our experiences with a prototype implementation.
... In these environment, a parallel program is specified as a graph with nodes containing a textual description of a sequential program. Most of the environments specify parallelism based on the large grain dataflow model developed by Babb [2] (for example: CODE [4], DGL [11], LGDF [8], Paralex [1], PPSE [12], TDFL [19]). In these models, an application is usually defined as a dataflow graph whose nodes contain sequential modules and the edges represents data dependencies between the modules. ...
Conference Paper
Full-text available
Workstation environments have been in use for more than a decade. Although a network of workstations represents a large amount of aggregate computing power, single users often cannot utilize these resources for their applications. Enterprise is a programming environment for designing, coding, debugging, testing, monitoring, profiling and executing programs in a distributed hardware environment. Programs written using Enterprise look like familiar sequential C code; the parallelism is expressed graphically. The system automatically inserts the code necessary to handle communication, synchronization and fault tolerance, allowing the rapid construction of correct distributed programs. Enterprise programs run on a network of computers, absorbing the idle cycles on machines. The system supports load balancing, limited process migration, and dynamic distribution of work in environments with changing resource utilization. This paper concentrates on the user's view of programming in Enterprise.
... In the past decade, many parallel pattern-based systems have been developed to employ design patterns related concepts in the HPC domain. Some of the systems based on similar ideas include Code [6], Frameworks [25], Enterprise [23], HeNCE [7], Tracs [5], DPnDP [26], and CO 2 P 3 S [4]. Unfortunately, most of these systems lack practical usability for the CB field because of the following reasons: ...
Conference Paper
Full-text available
Computational biology research is now faced with the burgeoning number of genome data. The rigorous post- processing of this data requires an increased role for high performance computing (HPC). Because the development of HPC applications for computational biology problems is much more complex than the corresponding sequen- tial applications, existing traditional programming tech- niques have demonstrated their inadequacy. Many high level programming techniques, such as skeleton and pattern based programming, have therefore been designed to pro- vide users new ways to get HPC applications without much effort. However, most of them remain absent from the main- stream practice for computational biology. In this paper, we present a new parallel pattern-based system prototype for computational biology. The underlying programming tech- niques are based on generic programming, a programming technique suited for the generic representation of abstract concepts. This allows the system to be built in a generic way at application level and thus provides good extensibil- ity and flexibility. We show how this system can be used to develop HPC applications for popular computational biol- ogy algorithms and lead to significant runtime savings on distributed memory architectures.
... Data flow languages generally operate at the level of fundamental operations rather than at a functional granularity. One exception is CODE 2, which permits incorporation of sequential code into a dynamic flow graph, but restricts shared state to a special node type [7, 8]. Data flow languages also typically prohibit global state. ...
Conference Paper
Programming high-performance server applications is challenging: it is both complicated and error-prone to write the concurrent code required to deliver high perfor- mance and scalability. Server performance bottlenecks are difficult to identify and correct. Finally, it is difficult to predict server performance prior to deployment. This paper presents Flux, a language that dramatically simplifies the construction of scalable high-performance server applications. Flux lets programmers compose off- the-shelf, sequential C or C++ functions into concurrent servers. Flux programs are type-checked and guaran- teed to be deadlock-free. We have built a number of servers in Flux, including a web server with PHP sup- port, an image-rendering server, a BitTorrent peer, and a game server. These Flux servers match or exceed the performance of their counterparts written entirely in C. By tracking hot paths through a running server, Flux simplifies the identification of performance bottle- necks. The Flux compiler also automatically generates discrete event simulators that accurately predict actual server performance under load and with different hard- ware resources.
... In the past decade, many parallel pattern-based systems have been developed to employ design patterns related concepts in the parallel computing domain in the context of object-oriented programming techniques. Some of the systems based on similar ideas include Code [2], Frameworks [15], Enterprise [16], HeNCE [4], Tracs [3], and DPnDP [14]. However, most of these systems lack practical usability for the following reasons [1,9]: ...
Conference Paper
Full-text available
Parallel program design patterns provide users a new way to get par- allel programs without much effort. However, it is always a serious limitation for most existing parallel pattern-based systems that there is no generic descrip- tion for the structure and behavior of a pattern at application level. This limita- tion has so far greatly hindered the practical use of these systems. In this paper, we present a new parallel pattern-based system for bioinformatics. The underly- ing programming techniques are based on generic programming, a program- ming technique suited for the generic representation of abstract concepts. This allows the new system to be built in a generic way at application level. We show how this system efficiently addresses the shortcomings of existing sys- tems and leads to significant runtime savings for some popular applications in bioinformatics on PC clusters.
... Many IEs were introduced in the last two decades. Among them are CODE [10], HeNCE [8], GARPNEL [13], GRADE [12], and TRAPPER [11]. ...
Article
Full-text available
In a wide variety of scientific parallel applications, both task and data parallelism must be exploited to achieve the best possible performance on a multiprocessor machine. These applications induce task-graph parallelism with coarse-grain granularity. Nevertheless, using the available task-graph parallelism and combining it with data parallelism can increase the performance of parallel applications considerably since an additional degree of parallelism is exploited. The OpenMP standard supports data parallelism but does not support task-graph parallelism. In this paper we present an integration of task-graph parallelism in OpenMP by extending the parallel sections constructs to include task-index and precedence-relations matrix clauses. There are many ways in which task-graph parallelism can be supported in a programming environment. A fundamental design decision is whether the programmer has to write programs with explicit precedence relations, or if the responsibility of precedence relations generation is delegated to the compiler. One of the benefits provided by parallel programming models like OpenMP is that they liberate the programmer from dealing with the underlying details of communication and synchronization, which are cumbersome and error-prone tasks. If task-graph parallelism is to find acceptance, writing task-graph parallel programs must be no harder than writing data parallel programs, and therefore, in our design, precedence relations are described through simple programmer annotations, with implementation details handled by the system. This paper concludes with a description of several parallel application kernels that were developed to study the practical aspects of task-graph parallelism in OpenMP. The examples demonstrate that exploiting data and task parallelism in a single framework is the key to achieving good performance in a variety of applications.
... Phred is similar to several other visual parallel programming environments, namely Code [9,20], HeNCE [4,5], Paralex [1,2], and Schedule [12]. However, Phred is unique in its graph structures and its emphasis on determinacy. ...
Article
Phred is a visual parallel programming language in which programs can be statically analyzed for deterministic behavior. This paper presents t he Phred language, tech- niques for analyzing the language, and a programming environment which supports Phred programming. There are many methods for specifying synchronization and data sharing in parallel programs. The Phred programmer uses graph constructs for describing parallelism, synchronization and data sharing . These graphs are formally described in this paper as a graph grammar. The use of graphs in Phred provides an intuitive and visual representation for parallel computat ions. The inadvertent specifi- cation of nondeterministic computations is a common error in parallel programming. Phred addresses the issue of determinacy by visually indicating regions of a program where nondeterminacy may exist. This analysis and its integration into a program- ming environment is presented here. The Phred programming environment supports the specification, analysis and execution of Phred programs. The distribution of the programming environment itself over several workstations is also described.
Chapter
We sketch the reasons for the I/O bottleneck in parallel and distributed systems, pointing out that it can be viewed as a special case of a general bottleneck that arises at all levels of the memory hierarchy. We argue that because of its severity, the I/O bottleneck deserves systematic attention at all levels of system design. We then present a survey of the issues raised by the I/O bottleneck in five key areas of parallel and distributed systems: applications, algorithms, compilers, operating systems and architecture. Finally, we address some of the trends we observe emerging in new paradigms of parallel and distributed computing: the convergence of networking and I/O, I/O for massively distributed “global information systems” such as the World Wide Web, and I/O for mobile computing and wireless communications. These considerations suggest exciting new research directions in I/O for parallel and distributed systems in the years to come.
Chapter
Parsec is a parallel programming environment whose goal is to simplify the development of multicomputer programs without, as is often the case, sacrificing performance. We have reconciled these objectives by “compiling” the structure of parallel applications into information to configure each of a small set of communication primitives on a context sensitive basis. In this chapter we show how Parsec can be used to implement a high-performance processor farm and compare Parsec and hand-optimized implementations to demonstrate that Parsec can achieve a similar level of performance. Extensive static analysis and optimization is necessary to achieve these results. We discuss both the tools which perform these tasks as well as the user interface that provides the necessary declarative structural information. Using the processor farm, we show how Parsec simplifies the task of specifying the structure of a parallel application and improves the result by supporting abstraction, reuse and scalability.
Chapter
Support for the programming of distributed computing systems has been a primary focus of distributed computing research. It has been recognized that programming a distributed system is more difficult than programming a centralized system. Many of the functions, such as task mapping, interprocess communication, remote invocation, synchronization, and reconfiguration, are very difficult to program. Tools that support parallel and distributed programming can greatly simplify such programming tasks.
Chapter
Wide area computer networks have become a basic part of today’s computing infrastructure. These networks connect a variety of machines, presenting an enormous computing resource. In this project we focus on developing methods and tools which allow a programmer to tap into this resource. In this talk we describe PVM and HeNCE, tools and methodology under development that assists a programmer in developing programs to execute on a networked group of heterogeneous machines. HeNCE is implemented on top of a system called PVM (Parallel Virtual Machine). PVM is a software package that allows the utilization of a heterogeneous network of parallel and serial computers as a single computational resource. PVM provides facilities for spawning, communication, and synchronization of processes over a network of heterogeneous machines. While PVM provides the low level tools for implementing parallel programs, HeNCE provides the programmer with a higher level abstraction for specifying parallelism.
Article
Full-text available
We present a f r a m e w o r k f o r a h i g h l e v el toolkit for solving partial diierential equations. The requirements for very large and complex PDE applications such a s computational dynamics and numerical relativity are examined in the framework of a modular toolkit approach based on visual programming. We a d d r e s s some of the principal non-numerical technical challenges : software integration, scheduling and distribution of the computation over a metacomputer. We also discus some of the challenges found in creating run-time sup-port systems and parallel grid generation modules for future systems.
Article
Designing parallel, distributed computations is a significant barrier to the effective use of contemporary equipment. One aspect of the barrier is the difficulty of partitioning a serial solution into a set of communicating computational subsets (e.g., processes) that can be distributed over heterogeneous processors in a distributed hardware environment. The Parallel Distributed computation Graph Model (ParaDiGM) and the VISual Assistant (VISA) have been designed to assist with the partitioning problem. The formal model is composed of two components: a micro model focuses on the functionality of the computation, while a consistent macro model explicitly represents the partition and the communication mechanisms. ParaDiGM encourages the designer to address functionality and partitioning in different submodels, maintaining a mapping between elements in the two submodels. ParaDiGM is formal, but has an intuitive visual presentation; its use is supported by the VISusal Assistant (VISA), a tool for designing, animating, simulating, and prototyping distributed computations. This note informally describes ParaDiGM and VISA, then illustrates how they can be used to assist with the design of parallel, distributed computations.
Article
Current approaches to software engineering practice for parallel systems are reviewed. The parallel software designer has not only to address the issues involved in the characterization of the application domain and the underlying hardware platform, but, in many instances, the production of portable, scalable software is desirable. In order to accommodate these requirements, a number of specific techniques and tools have been proposed, and these are discussed in this review in the framework of the parallel software life-cycle. The paper outlines the role of formal methods in the practical production of parallel software, but its main focus is the emergence of development methodologies and environments. These include CASE tools and run-time support systems, as well as the use of methods taken from experience of conventional software development. Because of the particular emphasis on performance of parallel systems, work on performance evaluation and monitoring systems is considered.
Article
A concurrent software application, whether running on a single machine or distributed across multiple machines, is composed of tasks that interact (communicate and sychronize) in order to achieve some goal. Developing such concurrent programs so they cooperate effectively is a complex task, requiring that progrmmers craft their modules–the components from which concurrent applications are built—to meet both functional requirements and communication requirements. Unfortunately the result of this effort is a module that is difficult to reason about and even more difficult to reuse. Making programmers treat too many diverse issues simultaneously leads to increased development costs and opportunities for error. This suggests the need for ways that a developer may specify control requirements separately from the implementation of functional requirements, but then have this information used automatically when building the component executables. The result is an environment where programmers have increased flexibility in composing software modules into concurrent applications, and in reusing those same modules. This paper describes our research toward a technology for control integration, where we have developed techniques for users to express control objectives for an application and a system that translates those specifications for use in packaging executables.
Article
In this expository overview, I briefly review the basics of computer architecture as they relate to parallel computers. Distributed memory, multiprocessor systems are emhasized. I cover methods to parallelize some fundamental types of ecological simulation models: foodweb models, individual-based population models, population models based on partial differential equations, and individual movement models. Recent developments in parallel operating systems and programming tools on multiprocessors are reviewed. Because of complex relationships between parallel computer architecture and efficient algorithms, I conclude that ecological modelers will need to become more acquainted with hardware than previously.
Article
We report on an experience of implementation of process farms on distributed systems. Rather than focusing on applications, we analyse in detail the techniques we have used for implementing the corresponding support mechanisms. They are actually part of a more general framework that can be easily extended to include other parallel programming paradigms. We try to substantiate the claim that our highly modular structuring may constitute both a practical and powerful approach for several problems of distributed programming support.
Article
The Parallel Evaluation and Experimentation Platform (PEEP) is the result of an effort at Rome Laboratory to identify the most promising general- purpose software development tools, techniques and approaches from industry and academia for programming high performance parallel computers to meet the needs of Command and Control (C2) applications. The PEEP is a prototype platform for evaluating the applicability of results from parallel programming research efforts to improve the productivity of designers and developers. Intermetrics conducted a study of available innovative tools and techniques beginning in early 1990. From the survey, Intermetrics chose candidates for inclusion on a prototype platform, and began to install and evaluate the chosen components. With the prototype PEEP, a number of case studies were conducted to develop small parallel programs using the selected tools. The purpose of these case studies was not to advance the state of the art in parallel algorithms, but to exercise the tools collected for the prototype PEEP. This work identified requirements on architectures, life cycle activities and technologies to support parallel development and developed a long range plan for the PEEP. The conclusions from these case studies also suggest useful methodologies for developing parallel software, and have led to recommendations based on the performance of the current tools and on the projected needs of parallel software development.
Article
A software development framework for parallel processing systems based on the parallel object-oriented functional computation model PROOF is evaluated. PROOF/L, a C++ based programming language with additional parallel constructs required by PROOF, is extended to include array data type and input/output features to make PROOFIL easier to use in developing software for parallel processing systems. The front-end translator from PROOF/L to the intermediate form IFl, and the back-end translators from IFl to the C languages on two different MIMD parallel machines, nCube and KSR, are developed. Our framework is evaluated by comparing it with existing software development approaches for parallel processing systems. Our framework is suitable for large-scale parallel software development because it supports the concepts of hierarchical design and shared data, and frees the software developer from considering explicit synchronization, communication, and parallelism. The software development efforts using our framework can be greatly reduced due to implicit synchronization and communication and the compactness of PROOF/L programs. The extension of PROOF/L and the integration of PROOF/L with other programming languages to utilize existing library functions written in languages such as C and FORTRAN are also discussed.
Article
Full-text available
Complete application tasks, of the type that would be of interest to Rome Laboratory, are large and complex. One approach to dealing with them is heterogeneous computing. Two types of heterogeneous computing systems are: (1) mixed-mode, wherein multiple types of parallelism are available on a single machine; and (2) mixed-machine, wherein a suite of different high-performance computers is connected by high-speed links. In this effort, we studied ways to decompose an application into subtasks and then match each subtask to the mode or machine, which results in the smallest total task execution time. Our accomplishments include: (1) conducting a mixed-mode case study; (2) developing an approach for automatically decomposing a task for mixed-mode execution, and assigning modes to subtasks; (3) extending this approach for use as an heuristic for a particular class of mixed-machine heterogeneous computing systems; (4) surveying the state-of-the-art of heterogeneous computing, and constructing a conceptual framework for automatic mixed-machine heterogeneous computing; (5) examining how to estimate non-deterministic execution of subtasks and complete tasks; and (6) devising an optimal scheme for inter-machine data transfers for a given matching of subtasks to machines.
Article
Today's supercomputers and parallel computers provide an unprecedented amount of computational power in one machine. A basic understanding of the parallel computing techniques that assist in the capture and utilization of that computational power is essential to appreciate the capabilities and the limitations of parallel supercomputers. In addition, an understanding of technical vocabulary is critical in order to converse about parallel computers. The relevant techniques, vocabulary, currently available hardware architectures, and programming languages which provide the basic concepts of parallel computing are introduced in this document. This document updates the document entitled Introduction to Parallel Supercomputing, M88-42, October 1988. It includes a new section on languages for parallel computers, updates the hardware related sections, and includes current references.
Article
Parsec is a parallel programming environment whose goal is to simplify the development of multicomputer programs without, as is often the case, sacrificing performance. We have reconciled these objectives by "compiling" the structure of parallel applications into information to configure each of a small set of communication primitives on a context-sensitive basis. In this paper, we show how Parsec can be used to implement a high-performance processor farm and compare Parsec and hand-optimized implementations to demonstrate that Parsec can achieve a similar level of performance. Extensive static analysis and optimization is necessary to achieve these results. We discuss both the tools which perform these tasks as well as the user interface that provides the necessary declarative structural information. Using the processor farm, we show how Parsec simplifies the task of specifying the structure of a parallel application and improves the result by supporting abstraction, reuse and scalability.
Article
Full-text available
In this paper we present and discuss a real experience of reusing sequential software in a parallel and physically distributed computing environment. Specifically, we have combined the functionalities of two existing systems previously developed at our Department. One, Tracs, is a programming environment for networked, heterogeneous machines that, among other things, is able to generate process farms out of a pure sequential code. The other, SPACE, is a graphical tool that generates sequential Fortran programs for simulating digital transmission systems. We have implemented a tool that restructures SPACE-generated programs to let them match the input required by the Tracs process farm generator. The result is that users of SPACE can transparently take advantage of networked and heterogeneous workstations to run their simulations. We have tackled the problems arising from both parallelism and distribution. The techniques we have used can be easily applied to any problem that can be modelled according to the process farm paradigm. Moreover, our experience shows that the Tracs framework may constitute a sound basis for facilitating engineering efforts on the reuse of sequential software in distributed environments.
Chapter
Performance engineering of parallel and distributed applications is a complex task that iterates through various phases, ranging from modeling and prediction, to performance measurement, experiment management, data collection, and bottleneck analysis. There is no evidence so far that all of these phases should/can be integrated in a single monolithic tool. Moreover, the emergence of Cloud computing as well as established Grid infrastructures as a wide-area platform for high-performance computing raises the idea to provide tools as interacting Web services that share resources, support interoperability among different users and tools, and most important provide omni-present services over Grid or Cloud infrastructures.
Chapter
Full-text available
The most visible facet of the Computationally-Oriented Display Environment (CODE) is its graphical interface. However, the most important fact about CODE is that it is a programming system based on a formal unified computation graph model of parallel computation which was intended for actual program development. Most previous programming systems based on formal models of computation have been intended primarily to serve as specification systems. This paper focuses on the interaction between the development of the formal model of parallel computation and the development of a practical programming environment. Basing CODE on a formal model of parallel computation was integral to attainment of the initial project goals of an increase in level of abstraction of representation for parallel program structure and architectural independence. It also led to other significant research directions, such as a calculus of composition for parallel programs, and has suggested other directions of research in parallel programming that we have not yet had the opportunity to pursue. We hope this experience with the interaction of the theoretical and the practical may be of interest and benefit to other designers and developers of parallel programming systems.
Chapter
The ParaGraph graph editor is a tool for specifying the graphical structure of parallel algorithms. Based on an extended formalism of Aggregate Rewriting Graph Grammars, it is an improvement on existing techniques for describing the families of regular, scalable communication graphs. We expect that ParaGraph will prove useful as a testbed for new techniques for describing, visualizing and analyzing the structure of very large graphs. This work describes ongoing formal (and practical) efforts to make ParaGraph an a effective tool for specifying massive parallelism.
Conference Paper
Design patterns make it easier to reuse successful designs and architectures. Expressing proven techniques as design patterns makes them more accessible to developers of new systems and helps a designer get a design faster. In sequential and object-oriented programming domain design patterns have played a very important role but in parallel programming the application of design patterns is very few. We propose a design-pattern based parallel programming model and implement a parallel programming environment in the SMP platform to help the developers to build their parallel application system efficiently. The experiment results show our system is effective and competent.
Article
The Task-Level Dataflow Language is a graphical language for architecture-independent parallel programming and is intended for the writing of new programs and the adaptation of existing ones. It is the first coarse-grained dataflow language that supports dynamic modification of program graphs. It provides a systematic use of program constructs to support particular programming styles, such as nondeterminism, iteration, and replication. It has been used successfully in a course on parallel programming.
Conference Paper
We have shown how graphical languages such as CODE/ROPE and PPSE can be used to design SIMD or data parallel programs. The advantages of this approach are machine independence, design clarity, automated program analysis, and accelerated software development. The disadvantages are that many problems remain in this approach such as how to reduce design clutter, how to automate optimal processor mapping, and how to perform messagepassing optimization. More work is needed before this approach can be used in the design of large-scale applications, but we believe the approach is promising. The main contribution is to show how the simple ideas of stencils, stream generators, and replicators can be used effectively to extend the classical dataflow design paradigm into the data parallel design paradigm. While more work is needed, these three ideas lead to greater expressiveness of design. The PPSE toolset currently does much of the mapping, scheduling, and automatic code generation described in this paper. However, PPSE does not currently handle data parallel programming. Work is progressing toward a full implementation of these ideas. Even so, PPSE has been invaluable for performing a variety of what-if analyses on parallel programs. Insights have been gained with this approach that would not be possible with a purely textual representation of the parallel program.
Conference Paper
This paper describes the use of the UNITY [6] notation and the UC compiler [2] in the design of parallel programs for execution on the Connection Machine CM2 (CM). We illustrate our ideas in the context of a computer simulation of particle diffusion and aggregation in a porous media. We begin with a UNITY specification, progressively refine the specification to derive a UNITY program, translate the program to UC abstractions, which may be further refined to improve efficiency, and finally implement the program on the CM. Performance results on the efficiency of the program constructed using this approach are also included.
Conference Paper
Full-text available
An architecture-independent software development approach for parallel processing systems is presented. This approach is based on the parallel object oriented and functional computation model PROOF and separates the architecture dependent issues from software development. It also facilitates software development for any parallel processing systems by relieving the programmers from the consideration of processor topology and various parallelization aspects of the software. Our approach allows the exploitation of parallelism at both levels of granularity: object level and method level, thereby making our approach effective for software development for various MIMD computers. Software developed using our approach reflects the parallel structure of the problem space which makes the software more understandable and modifiable. A framework consisting of object-oriented analysis, object-design, coding and transformation phases is presented for software development for parallel processing systems. An example is given to illustrate this approach
Conference Paper
Full-text available
Programming languages that can utilize the underlying parallel architecture in shared memory, distributed memory or Graphics Processing Units (GPUs) are used extensively for solving scientific problems. However, from our observation of studying multiple parallel programs from various domains, such programming languages have a substantial amount of sequential code mixed with the parallel code. When rewriting the parallel code for another platform, the same sequential code is often reused without much modification. Although this is a common occurrence, existing tools and programming environments do not offer much support for this process. In this paper, we introduce a tool named PPmodel, which was designed and implemented to assist programmers in separating the core computation from the details of a specific parallel architecture. Using PPmodel, a programmer can identify and retarget the parallel section of a program to execute in a different platform. With PPmodel, a programmer is better enabled to focus on the parallel section of interest, while ignoring other parallel and sequential sections in a program. The tool is explained by example execution of the parallel section of an OpenMP program for the circuit satisfiability problem in a cluster using the Message Passing Interface (MPI).
Conference Paper
Application of pattern-based approaches to parallel programming is an active area of research today. The main objective of pattern-based approaches to parallel programming is to facilitate the reuse of frequently occurring structures for parallelism whereby a user supplies mostly the application specific code-components and the programming environment generates most of the code for parallelization. Parallel Architectural Skeleton (PAS) is such a pattern-based parallel programming model and environment. The PAS model provides a generic way of describing the architectural/structural aspects of patterns in message-passing parallel computing. Application development using PAS ishierarchical, similar to conventional parallel programming using MPI, however with the added benefit of reusability and high level patterns. Like most other pattern-based parallel programming models, the benefits of PAS were offset by some of its drawbacks such as difficulty in: (1) extending PAS and (2) skeleton composition. SuperPAS is an extension of PAS that addresses these issues. SuperPAS provides a skeleton description language for the generic PAS. Using SuperPAS, a skeleton developer can extend PAS by adding new skeletons to the repository (i.e., extensibility). SuperPAS also makes the PAS system more flexible by defining composition of skeletons. In this paper, we describe SuperPAS and elaborate its use through examples.
Article
This paper describes Parallel Proto (PProto), an integrated environment for constructing prototypes of parallel programs. Using functional and performance modeling of dataflow specifications, PProto assists in analysis of high-level software and hardware architectural tradeoffs. Facilities provided by PProto include a visual language and an editor for describing hierarchical dataflow graphs, a resource modeling tool for creating parallel architectures, mechanisms for mapping software components to hardware components, an interactive simulator for prototype interpretation, and a reuse capability. The simulator contains components for instrumenting, animating, debugging, and displaying results of functional and performance models. The Pproto environment is built on top of a substrate for managing user interfaces and database objects to provide consistent views of design objects across system tools.
Article
Performance orientation in the development process of parallel software is motivated by outlining the misconception of current approaches where performance activies come in at the very end of the development, mainly in terms of measurement or monitoring after the implementation phase. At that time major part of the development work is already done, and performance pitfalls are very hard to repair - if this is possible at all. A development process for parallel programs that launches performance ...
Article
Full-text available
A simple program that approximates π by numerical quadrature is rewritten to run on nine commercially available processors to illustrate the compilations that arise in parallel programming in FORTRAN. The machines used are the Alliant FX/8, BBN Butterfly, Cray X-MP/48, ELXSI 6400, Encore Multimax, Flex/32, IBM 3090/VF, Intel iPSC, and Sequent Balance. Some general impediments to using parallel processors to do production work are identified
Article
The work of Adams, Karp and Miller, Luconi, and Rodriguez on formal models for parallel computations and computer systems is reviewed. A general definition of a parallel schema is given so that the similarities and differences of the models can be discussed. Primary emphasis is on the control structures used to achieve parallel operation and on properties of the models such as determinacy and equivalence. Decidable and undecidable properties are summarized. (Author)
Article
In this paper we briefly describe and compare a number of theoretical models for parallel computation; namely, Petri nets, computation graphs, and parallel program schemata. We discuss various problems and properties of parallel computation that can be studied within these formulations and indicate the ties between these properties and the more practical aspects of parallel computation. We show how marked graphs, a particular type of Petri net, are a restricted type of computation graph and indicate how some results of marked graphs can be obtained from known results of computation graphs. Also, for schemata we discuss the decidability versus undecidability of various properties and several techniques of schemata composition.
Article
The emergence of commercially produced parallel computers has greatly increased the problem of producing transportable mathematical software. Exploiting these new parallel capabilities has led to extensions of existing languages such as FORTRAN and to proposals for the development of entirely new parallel languages. We present an attempt at a short term solution to the transportability problem. The motivation for developing the package has been to extend capabilities beyond loop based parallelism and to provide a convenient machine independent user interface. A package called SCHEDULE is described which provides a standard user interface to several shared memory parallel machines. A user writes standard FORTRAN code and calls SCHEDULE routines which express and enforce the large grain data dependencies of his parallel algorithm. Machine dependencies are internal to SCHEDULE and change from one machine to another but the users code remains essentially the same across all such machines. The semantics and usage of SCHEDULE are described and several examples of parallel algorithms which have been implemented using SCHEDULE are presented.
Conference Paper
PFG (parallel flow graphs) as a language for expression of concurrent, time-dependent computations is described. PFG is rich enough to express many of the common concurrent control structures found in parallel languages, as well as some less common ones. Each syntactic structure in PFG has a direct translation into a portion of a time Petri net model. The net created by legally combining PFG structures is guaranteed to be well-formed, in the sense that each Petri net is the free-choice class and has a clear interpretation in terms of a hardware/software system. Several techniques have been defined which allow the model produced from a PFG program to be analyzed for concurrency properties, such as deadlock freedom and proper mutual exclusion on shared data structures.
Article
The Navier-Stokes computer is a high-performance, reconfigurable, pipelined machine designed to solve large computational fluid dynamics problems. Due to the complexity of the architecture, development of effective, high-level language compilers for the system appears to be a very difficult task. Consequently, a visual programming methodology has been developed which allows users to program the system at an architectural level by constructing diagrams of the pipeline configuration. These schematic program representations can then be checked for validity and automatically translated into machine code. The visual environment is illustrated by using a prototype graphical editor to program an example problem.
Conference Paper
The goals for the Computation Oriented Display Environment (CODE) are to provide a representation power sufficient for facile expression of a wide class of parallel algorithms while at the same time permitting compilation to reasonably efficient programs on a wide spectrum of parallel execution environments and to provide a hierarchical approach to development of parallel programs. CODE is based on a formally specified model of parallel computation which covers most conventional MIMD models of parallel computation. The model is formulated at a higher level of abstraction than conventional MIMD shared-name-space and partitioned-name-space models of parallel computation. The conceptual foundation of CODE, in particular basing the language on an abstract model of parallel computation, has led to two significant capabilities which had not been anticipated: a calculus of composition which may be exploitable for automated or semiautomated program construction and a natural basis for highly effective component reuse
A Comparison of 12 Parallel Forum Dialect$," lb2X ,%J-ionr~
  • H Kdrp
  • R G Babb
H. Kdrp and R.G. Babb, "A Comparison of 12 Parallel Forum Dialect$," lb2X,%J-ionr~, Sept. 1988, pp. 52-68.
ROPE Lser's Mati-cial: A Reusability-Oriented Parallel P r e References
  • T: ] I Ee
  • C I Lin
T:]. I.ee and C.I.. Lin, "ROPE Lser's Mati-cial: A Reusability-Oriented Parallel P r e References
1,. I.in, "Prtr granining with CODE: A <:omputation-Oriented Display Environment
  • J C Browne
J.C. Browne, M. h. a m, and C.1,. I.in, "Prtr granining with CODE: A <:omputation-Oriented Display Environment," tech. report, Computer Sciences Dept., Univ. of Texas, Austin, Texas, 1988.
ASurvey of Models for Parallel Computing Digital Systems LabA Comparison of Models of Parallel Computation
  • T H Bredt
T.H. Bredt,"ASurvey of Models for Parallel Computing,"Tech. Report8, Digital Systems Lab., 3. J.L. Peterson and T.H. Bredt, "A Comparison of Models of Parallel Computation." Proc. 1974
ROPE User's Manual: A Reusability-Oriented Parallel Programming Environment
  • T J Lee
  • C L Lin
A Visual Programming Environment for the Navier-Stokes Computer
  • S Tamboulian
  • T W Croskett
  • D Middleton
A. Constructive Unified Model of Parallel Computation
  • S M Sobek