Article

Synchronization and control of parallel algorithms

Authors:
  • Math Cube Associates, Inc.
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We propose a modest collection of primitives for synchronization and control in parallel numerical algorithms. These are phrased in a syntax that is compatible with FORTRAN, creating a publication language for parallel software. A preprocessor may be used to map code written in this extended FORTRAN into standard FORTRAN with calls to the run-time libraries of the various parallel systems now in use. We solicit the reader's comments on the clarity, as well as the adequacy, of the primitives we have proposed.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The classical synchronization control theory is mainly divided into parallel control, master-slave control, and cross-coupling control (CCC). [4][5][6] Among them, the CCC is first proposed by Koren 7 for a biaxial motion platform. This synchronization control scheme adopted velocity or angle error of subsystem as additional feedback signals to reflect the load variation of any axis. ...
Article
Full-text available
For a hybrid mechanism used for automobile electro-coating conveying, a chattering-free sliding mode synchronization controller is proposed to improve the synchronization performance during motion process. The Jacobi matrix of the mechanism is calculated according to kinematic analysis, and the dynamic model is established by Lagrange method. Since the mechanism consists of two sets of hybrid mechanisms bilaterally, a novel synchronization error including the synchronization error between two ends of the end-effector is designed. By combining cross-coupling control with chattering-free sliding mode control, a novel chattering-free sliding mode synchronization controller is proposed. The stability of the proposed algorithm is proved by Lyapunov stability theorem. The simulation and experimental results show that the proposed controller can effectively reduce the chattering of driving forces and further improve the synchronization performance and tracking accuracy of the system.
... The following equation is derived in Appendix A. 4 , Y(bt )n-j = (ba) , --(n + I)!~n. The synchronization and control of parallel algorithms raises a wealth of problems in areas ranging from numerical analysis, to parallel languages design, communication in parallel systems, performance analysis, and other areas. ...
Article
Understanding synchronization is important for a parallel programming tool that uses dependence analysis as the basis for advising programmers on the correctness of parallel constructs. This paper discusses static analysis methods that can be applied to parallel programs with event variable synchronization. The objective is to be able to predict potential data races in a parallel program. The focus is on how dependencies and synchronization statements inside loops can be used to analyze complete programs with parallel loop and parallel case style parallelism.
Chapter
If a looping construct has the property that no iteration of the loop requires for its execution a result from a previous iteration, correct results can be obtained by executing all of the loop iterations simultaneously on different processors. At Myrias Research Corporation, a virtual machine has been designed, and is being implemented, on which such parallel execution of loop constructs is achieved by replacing the looping instruction (“do” in ANSI Fortran 77) with a modified instruction (“par do” in Myrias Parallel Fortran — MPF). Each loop iteration executes in its own, separate memory space, and these memory spaces are automatically merged when all of the tasks have completed. The architecture of the Myrias parallel computer is described briefly. We describe a new memory model which utilizes local, distributed memory, and allows for the dynamic reconfiguration of parallel tasks at the operating system level. Such a model gives rise to a powerful new parallel programming method. Further, we describe how recursive parallel methods (RPM) can be used effectively. The three main language extensions are orthogonal, and their combination provides easy access to flexible, high order, adaptive algorithms. Algorithmic examples of the application of this programming model are demonstrated.
Article
Understanding synchronization is important for a parallel programming tool that uses dependence analysis as the basis for advising programmers on the correctness of parallel constructs. This paper discusses static analysis methods that can be applied to parallel programs with event variable synchronization. The objective is to be able to predict potential data races in a parallel program. The focus is on how dependencies and synchronization statements inside loops can be used to analyze complete programs with parallel loop and parallel case style parallelism.
Conference Paper
Full-text available
We describe a debugger that is being developed for distributed programs in Amoeba. A major goal in our work is to make the debugger independent of the Amoeba kernel. Our design integrates many facilities found in other debuggers, such as execution replay, ...
Conference Paper
DieGePaRD -Programmierumgebung (GeneralParallelRuntime Environment on Distributed Systems) stellt ein stellt ein Werkzeug zur Programmierung paralleler und heterogener Prozeßsysteme auf Parallelrechnern mit verteiltem Speicher dar. Durch die speziellen Eigenschaften des Systems (Lokalität der Prozesse - dezentrale Kontrolle - globale Strukturierung) wird eine hohe Modularität und Flexibilität in den Programmsystemen erreicht. Im Folgenden werden die Grundideen und Konzepte vonGePaRDdargestellt, ohne auf Spezifikations- und Implementierungsdetails einzugehen. Es wird vielmehr ein Programmiermodell vorgestellt, das allein auf der Basis von lokalen Prozeßen und Message-Passing komplexe Kommunikations- und Synchronisationsmechanismen realisiert.
Article
Article
An adaptive task partitioning scheme for MIMD architectures is investigated. For many serial adaptive procedures this methodology provides a direct translation into reasonably efficient parallel versions.A proto-type of a two-dimensional integration method has been implemented in this manner. This was facilitated by using a set of high-level macros, layered over the Argonne macro package, which provides the primitives in the adaptive partitioning scheme.Because of the portable nature of the Argonne macro package our code should be readily ported to other MIMD machines with shared memory. A version of the macros for other MIMD architectures is also possible.
Article
A barrier is a method for synchronizing a large number of concurrent computer processes. After considering some basic synchronization mechanisms, a collection of barrier algorithms with either linear or logarithmic depth are presented. A graphical model is described that profiles the execution of the barriers and other parallel programming constructs. This model shows how the interaction between the barrier algorithms and the work that they synchronize can impact their performance. One result is that logarithmic tree structured barriers show good performance when synchronizing fixed length work, while linear self-scheduled barriers show better performance when synchronizing fixed length work with an imbedded critical section. The linear barriers are better able to exploit the process skew associated with critical sections. Timing experiments, performed on an eighteen processor Flex/32 shared memory multiprocessor that support these conclusions, are detailed.
Article
this paper we describe the features and semantics of ParC. The rest of this section explains the motivation for designing a new language, the eect of the motivating forces on the design, and the structure of the software environment that surrounds it. The next section describes the parallel constructs and scoping rules. The exact semantics of parallel constructs when there are more activities than processors have been widely neglected in the literature. We discuss this issue and provide guidelines for acceptable implementations. We then describe the innovative instructions for forced termination, which are based on analogies with C instructions that break out of a construct, followed by a discussion of synchronization mechanisms. A discussion of the programming methodology of ParC is then given and is followed by a discussion of our experiences with ParC . A comparison of ParC with other parallel programming languages is delayed until the end of the paper, after we have described all of its features
Article
Contemporary parallel programming languages often provide only few low-level primitives for pairwise communication and synchronization. These primitives are not always suitable for the interactions being programmed. Programming would be easier if it was possible to tailor communication and synchronization mechanisms to fit the needs of the application, much as abstract data types are used to create application-specific data structures and operations. This should also include the possibility of expressing interactions among multiple processes at once. Communicators support this paradigm by creating abstract communication objects that provide a framework for interprocess multiparty interactions. The behavior of these objects is defined in terms of interactions, in which multiple processes can enrole. Interactions are performed when all the roles are filled by ready processes. Nondeterminism is used when the order of interaction performance is immaterial. Interactions can also ...
Article
We present an extension to the FORTRAN language that allows the user to specify parallelism by means of clearly defined, nestable blocks. The implementation achieves compilerindependence through a portable preprocessor. High performance is obtained by prespawning processes and relying on a set of run-time routines to manage a self-scheduling allocation scheme. The resulting system, which we call ParFOR, lends itself to the exploitation of finegrained parallelism because of its low scheduling overhead. It encourages the elimination of explicit process synchronization, thereby enhancing the readability of the source program. In addition, ParFOR provides a variety of modes and compile-time options that are useful for performance measurement and debugging. Finally, we present an evaluation of system efficiency including timing results for several parallel applications running on the eight-processor Ultracomputer prototype. 1. Introduction It has been shown that parallel programming can b...
Article
Full-text available
In the process of learning how to write code for the Denelcor HEP, we have developed an approach that others may well find useful. We believe that the basic synchronization primitives of the HEP (i.e., asynchronous variables), along with the prototypical patterns for their use given in the HEP FORTRAN 77 User's Guide, form too low-level a conceptual basis for the formulation of multiprocessing algorithms. We advocate the use of monitors, which can be easily implemented using the HEP primitives. Attempts to solve substantial problems without introducing higher-level constructs such as monitors can produce code that is unreliable, unintelligible, and restricted to the specific dialect of FORTRAN currently supported on the HEP. Our experience leads us to believe that solutions which are both clear and efficient can be formulated using monitors.
Article
Full-text available
In this report we give a detailed presentation of how monitors can be implemented on the HEP using a simple macro processor. We then develop the thesis that a small body of general-purpose monitors can be defined to handle most standard synchronization patterns. We include the macro packages required to implement some of the more common synchronization patterns, including the fairly complex logic discussed in a previous paper. Code produced using these macro packages is portable from one multiprocessing environment to another. Indeed, by recoding the set of basic macros (about 100 lines of code for the Denelcor HEP), most programs that we are new writing could be moved to any similar multiprocessing system.
Article
This paper suggests that input and output are basic primitives of programming and that parallel composition of communicating sequential processes is a fundamental program structuring method. When combined with a development of Dijkstra's guarded command, these concepts are surprisingly versatile. Their use is illustrated by sample solutions of a variety of a familiar programming exercises.
Book
This paper suggests that input and output are basic primitives of programming and that parallel composition of communicating sequential processes is a fundamental program structuring method. When combined with a development of Dijkstra's guarded command, these concepts are surprisingly versatile. Their use is illustrated by sample solutions of a variety of familiar programming exercises.
Article
This chapter is intended for all those who expect that in their future activities they will become seriously involved in the problems that arise in either the design or the more advanced applications of digital information processing equipment; they are further intended for all those who are just interested in information processing.
Article
This paper presents a proposal for structured representation of multiprogramming in a high level language. The notation used explicitly associates a data structure shared by concurrent processes with operations defined on it. This clarifies the meaning of programs and permits a large class of time-dependent errors to be caught at compile time. A combination of critical regions and event variables enables the programmer to control scheduling of resources among competing processes to any degree desired. These concepts are sufficiently safe to use not only within operating systems but also within user programs. 1
Article
The analysis of aerospace structures by the finite element method consumes considerable computer time. The cost of this resource and the designer's desire to have rapid feedback concerning such questions as the effect of a change in loading of the structure or in a parameter of some structural material led to the design of a special purpose parallel computing system for finite element analysis. As a special purpose computer, the architecture of this finite element computer is closely tied to computational aspects of the particular problem. Various aspects of an MIMD array of microprocessors are related to the requirements of the class of finite element analysis problems which it is intended to solve.
A modest collection of FORTRAN extensions for the construction of parallel algorithms
  • P O Frederickson
  • G R Jones
  • B T Smith
P.O. Frederickson, G.R. Jones and B.T. Smith, A modest collection of FORTRAN extensions for the construction of parallel algorithms, Argonne National Laboratory (to appear).