Balkrishna Ramkumar&#x27;s research while affiliated with University of Iowa and other places

ELMO: Extending (Sequential) Languages with Migratable Objects - Compiler Support

S.G. Rathnam

Article

November 1997

6 Reads

R. J. Richards

Portable Checkpointing for Heterogeneous Architectures

S. G. Rathnam

Efficient task migration is an important feature in parallel and distributed programs, in particular to support checkpointing and recovery for fault tolerance. It is also very useful in distributed environments like networks of workstations where external loads are often unpredictable and dynamic in nature. We propose simple language extensions (ELMO) to existing sequential programming languages like C, Fortran or C++, that provide a object-based task-parallel execution model. Tasks may be dynamically created, are location transparent, and may be migrated or checkpointed transparently by the system. In this paper, ELMO's language features together with the requisite compiler support is presented. 1 Introduction Networks of workstations present some important problems that need to be addressed before they can be used effectively as a parallel computer. (a) Since local-area networks are not explicitly designed for parallel computing, such systems exhibit high communication latencies wh...

Conference Paper

July 1997

16 Reads

100 Citations

Portable Checkpointing for Heterogeneous Archtitectures

Current approaches for checkpointing assume system homogeneity, where checkpointing and recovery are both performed on the same processor architecture and operating system configuration. Sometimes it is desirable or necessary to recover a failed computation on a different processor architecture. For such situations checkpointing and recovery must be portable. In this paper, we argue that source-to-source compilation is an appropriate concept for this purpose. We describe the compilation techniques that we developed for the design of the c2ftc prototype. The c2fte compiler enables machine-independent checkpoints by automatic generation of checkpointing and recovery code. Sequential C programs are compiled into fault tolerant C programs, whose checkpoints can be migrated across heterogeneous networks, and restarted on binary incompatible architectures. Experimental results on several systems provide evidence that the performance penalty of portable checkpointing is negligible for realistic checkpointing frequencies

Article

June 1997

10 Reads

33 Citations

Perspectives for High Performance Computing in Workstation Networks

Current approaches for checkpointing assume system homogeneity, where checkpointing and recovery are both performed on the same processor architecture and operating system configuration. Sometimes it is desirable or necessary to recover a failed computation on a different processor architecture. For such situations checkpointing and recovery must be portable. In this paper, we argue that source-to-source compilation is an appropriate concept for this purpose. We describe the compilation techniques that we developed for the design of the c2ftc prototype. The c2ftc compiler enables machine-independent checkpoints by automatic generation of checkpointing and recovery code. Sequential C programs are compiled into fault tolerant C programs, whose checkpoints can be migrated across heterogeneous networks, and restarted on binary incompatible architectures. Experimental results on several systems provide evidence that the performance penalty of portable checkpointing is negligible for realistic checkpointing frequencies.

June 1997

69 Reads

1 Citation

. Networks of workstations have become increasingly popular for high performance computing. However, in order to becomea real alternative for MPPs, reliability and efficiency issues must be tackled. In this paper, we identify the key challenges for very large workstation networks, and describe implementation techniques at system software level to overcome these problems. CROWN, a testbed for experimenting with these mechanisms is briefly discussed. 1 Introduction High Performance Computing (HPC) to date has been attempted via one of three primary approaches: Vector-supercomputers (e.g., Cray), Massively Parallel Processors (MPPs, such as the Intel Paragon), and Networks of Workstations (NOWs). Vector supercomputers are expensive and MPPs are not cost-effective and often considered difficult to program. NOWs suffer from high latencies and scheduling overhead that is endemic in time-shared environments. Figure 1 projects the growth of workstation performance over the next two decades an...

Download

ProperTEST: A portable parallel test generator for sequential circuits

Article

May 1997

13 Reads

4 Citations

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Perspectives on high performance network computing

Prithviraj Banerjee

Parallel algorithms developed for CAD problems today suffer from two important drawbacks. First, they are machine specific, and tend to perform poorly on architectures other than the one for they were designed. Second, the quality of results degrades significantly during parallel execution. In this paper, we address these two problems for an important CAD application: test generation for sequential circuits, We have developed a new parallel test generator, ProperTEST, that is portable across a range of MIMD parallel architectures. This work is part of the ProperCAD project which aims to develop CAD algorithms that run unchanged on shared and nonshared memory machines. We present performance data for ProperTEST on ISCAS 89 sequential circuits on a Sequent Symmetry, an Intel i860 hypercube, an NCUBE/2 hypercube, a network of Sun workstations, and an Encore Multimax. Parallel processing can also be used to improve on the fault coverage possible on one processor in a given amount of time. This was not possible in earlier approaches due to search anomalies. Using ProperTEST, we provide results on ISCAS 89 benchmark programs demonstrating the improvements in fault coverage as the number of processors is increased

Article

April 1997

6 Reads

1 Citation

Future Generation Computer Systems

Networks of workstations have become increasingly popular for high performance computing. However, in order to become a real alternative for Massively Parallel Processors (MPPs), reliability and efficiency issues must be tackled. In this paper, we identify the key challenges for very large workstation networks, and describe implementation techniques at system software level to overcome these problems.

ProperSYN: A Portable Parallel Algorithm for Logic Synthesis

Article

February 1997

12 Reads

2 Citations

Kaushik De

The Charm Parallel Programming Language and System

Prithviraj Banerjee

Parallel processing is fast becoming an attractive solution to reduce the computation time of CAD applications. Much of the work in parallel algorithms for CAD reported to date, however, suffers from a major limitation. The parallel algorithms proposed for the CAD applications are designed with a specific underlying parallel architecture in mind. We have developed a portable parallel algorithm based on the Transduction method [1], called ProperSYN. The same algorithm runs on a variety of parallel machines. Experimental results on various parallel machines are presented. 1 Introduction Combinational logic synthesis deals with the optimization of logic to realize a specific combinational function [2], [1]. Logic synthesis for large circuits have tremendous computing times and memory requirements. Parallel processing offers an attractive solution to this problem, hence researchers have started to investigate parallel algorithms for logic synthesis and verification [3] [4] [5]. Much of t...

... In this work, we explore the parallel capabilities of the spiking neural network simulator, STACS (Simulation Tool for Asynchronous Cortical Streams) [15], which seeks to address some of these challenges to scaling neural simulations. STACS was developed to be parallel from the ground up, building on top of the Charm++ parallel programming framework, which expresses a paradigm of asynchronous messagedriven parallel objects [18,19]. Here, STACS takes advantage of the multicast communication pattern supported by Charm++ to match the irregular communication workload of biological scale models. ...
Reference:
Scaling neural simulations in STACS

Citing Article

... The first phase, ProperCAD I, involved the use of the C-based Charm language and runtime system [18, 19]. As part of ProperCAD I, a suite of parallel applications was developed that address the most significant tasks in VLSI design automation including circuit extraction [20], test generation [21], and logic synthesis [22]. An earlier version of our placement tool, ProperPLACE, was also developed using ProperCAD I [23]. ...
Reference:
An evaluation of parallel simulated annealing strategies with application to standard cell placement

Properext: a portable parallel algorithm for vlsi circuit extraction

Citing Article

The Reduce-OR process model for parallel logic programming on non-shared memory machines

Prith Banerjee

... This is in fact what happens in a pure OR parallel system such as Aurora [10] or Muse [2], which build a " cactus stack " of bindings in a shared memory space; an interior node with n branches is the stack frame for a call with n successful matches. Kalé's Reduce-OR process model is an example of a process oriented system, similar to the AND/OR model, that can operate on all results simultaneously [8]. Kalé calls this form of AND parallelism " consumer instance parallelism " , reflecting the fact that one invocation of the continuation (the consumer) is made for each result sent. ...
Reference:
Continuation-Based Control in the Implementation of Parallel Logic Programs.

Citing Article

L. V. Kal'e

A Chare kernel implementation of a parallel Prolog compiler

... Functional languages have also supported parallel architecture for decades. Compilers for Prolog[34, 50, 23], Lisp[35], and other similar languages[9, 12, 20, 66] have been implemented. Main-stream languages such as C, C++, and Java, also have robust support for parallel hardware[26, 27, 3, 71, 17] Versions of Ada, which was heavily supported by federal funding, also have support for parallel architectures[14]. ...
Reference:
On the Marketing of Multicore

Citing Article
March 1990

ACM SIGPLAN Notices

Portable Checkpointing for Heterogeneous Archtitectures

L. V. Kale

... Several researchers have proposed migration techniques designed to work with strongly-typed programming languages, like Java [28] and Emerald [24]. Many more have proposed migration and/or CPR through the instrumentation of well-typed C code [4, 10, 11, 14, 19, 23, 25, 26]. The definition of well-typed code is different among these works, with some work supporting certain type-unsafe constructs. ...
Reference:
Execution Migration in a Heterogeneous-ISA Chip Multiprocessor

Citing Article
June 1997

Prioritization in parallel symbolic computing

... Together, asynchronous messaging and object-based virtualization enable the dynamic overlap of communication and computation: a processor may overlap messaging latency with not just the sending object's succeeding computation, but also with useful computation of other objects on that processor. Previous work [7], [8], [9] has been done on top of CHARM++ to study the load balancing issues in state space search problems. Below, we describe some of the key features of the CHARM++ model that pertain to the search engine. ...
Reference:
An Adaptive Framework for Large-Scale State Space Search

Citing Chapter
April 2006

... This optimizes communication but no core has an idea of the global situation and, as we argue in this paper, this leads to suboptimal task assignment and exploration. We also refer the reader to Shu and Kale [1989], Saletore and Kale [1990], Kalé et al. [1992], Sinha and Kalé [1993], Abu-Khzam et al. [2007], Sun et al. [2011], Weerapurage et al. [2011] for further ideas that have focused on how to explore tasks efficiently in parallel. These works all aim to describe search tree distribution paradigms that are applicable to any branch-and-bound or branching FPT algorithm, of which the framework of Abu-Khzam et al. [2015] is the latest, to our knowledge. ...
Reference:
A lightweight semi-centralized strategy for the massive parallelization of branching algorithms

Prioritization in Parallel Symbolic Computing.

Citing Conference Paper
January 1992

... To our knowledge, the P C 3 system is the first system to provide all of these features. Previous work on portable checkpoint- ing [4, 14, 16] has focused on uniprocessor applications. Systems based on over-decomposition such as AMPI [6] provide either nontransparent support for portable checkpointing or use operating system checkpointing primitives that cannot be used on different archi- tectures. ...
Reference:
Mobile MPI programs in computational Grids

Portable Checkpointing for Heterogenous Architectures.

Citing Conference Paper
January 1997

Compiled Execution of the Reduce-OR Process Model on Multiprocessors.

... We have run a series of experiments in granularity control using two existing parallel logic programming systems: ROLOG and &-Prolog. ROLOG is a pure logic programming system based on Kale's reduce-or model [11,18]; programs are annotated for parallelism by the user. &-Prolog is a parallel Prolog system based on strict-and non-strict independence [7] which uses a modified RAP-WAM abstract machine [6], and where annotations for parallelism can be automatic or userprovided. ...
Reference:
Task Granularity Analysis in Logic Programs

Citing Conference Paper
January 1989

Joining AND Parallel Solutions in AND/OR Parallel Systems.

Laxmikant V. Kalé

... Backtracking problem can be solved when AND-parallelism is combined with OR-parallelism because backtracking in such systems becomes unnecessary [34]. Various execution models and systems exploiting both AND-and OR-parallelism have been proposed in recent years [8,20,32]. Such models usually overcome some problems when only one form of parallelism is supported, and make the best use of the underlying parallel execution facilities. ...
Reference:
Exploiting And-Parallelism And Combined And/orparallelism In Logic Programs: A Survey

Citing Conference Paper
January 1990

Laxmikant V. Kalé