Balkrishna Ramkumar's research while affiliated with University of Iowa and other places

What is this page?


This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

Publications (51)


Prioritization in parallel symbolic computing
  • Chapter

April 2006

·

10 Reads

·

12 Citations

L. V. Kale

·

B. Ramkumar

·

·

A. B. Sinha

It is argued that scheduling is an important determinant of performance for many parallel symbolic computations, in addition to the issues of dynamic load balancing and grain size control. We propose associating unbounded levels of priorities with tasks and messages as the mechanism of choice for specifying scheduling strategies. We demonstrate how priorities can be used in parallelizing computations in different search domains, and show how priorities can be implemented effectively in parallel systems. Priorities have been implemented in the Charm portable parallel programming system. Performance results on shared-memory machines with tens of processors and nonshared-memory machines with hundreds of processors are given. Open problems for prioritization in specific domains are given, which will constitute fertile area for future research in this field.

Share

Fig. 1. Illustration of the assignment of bit-vector priorities to nodes. The priority of the topmost node is assumed to be X.  
Fig. 2. The prioritization strategy leads to a characteristic broom-stick sweep of the search space.  
Table 2 . Execution times (in seconds) of the ProperTEST test pattern generator for sequential circuits on selected ISCAS89 sequential benchmark circuits on the Intel i860 hypercube. All reported execution times are in seconds.
Fig. 3. The nature of inherent parallelism in IDA*.  
Fig. 6. Performance of a prioritization scheme for game tree search, on Sequent Symmetry. Speedups are relative to one processor speeds.

+2

Prioritization in Parallel Symbolic Computing
  • Article
  • Full-text available

April 2001

·

102 Reads

·

11 Citations

It is argued that scheduling is an important determinant of performance for many parallel symbolic computations, in addition to the issues of dynamic load balancing and grain size control. We propose associating unbounded levels of priorities with tasks and messages as the mechanism of choice for specifying scheduling strategies. We demonstrate how priorities can be used in parallelizing computations in different search domains, and show how priorities can be implemented effectively in parallel systems. Priorities have been implemented in the Charm portable parallel programming system. Performance results on shared-memory machines with tens of processors and nonshared-memory machines with hundreds of processors are given. Open problems for prioritization in specific domains are given, which will constitute fertile area for future research in this field.

Download

ELMO: extending (sequential) languages with migratable objects-compiler support

January 1998

·

12 Reads

·

1 Citation

Efficient task migration is an important feature in parallel and distributed programs, in particular to support checkpointing and recovery for fault tolerance. It is also very useful in distributed environments like networks of workstations where external loads are often unpredictable and dynamic in nature. We propose simple language extensions (ELMO) to existing sequential programming languages like C, Fortran or C++, that provide an object based task parallel execution model. Tasks may be dynamically created, are location transparent, and may be migrated or check pointed transparently by the system. ELMO's language features, together with the requisite compiler support is presented


ELMO: Extending (Sequential) Languages with Migratable Objects - Compiler Support

November 1997

·

6 Reads

Efficient task migration is an important feature in parallel and distributed programs, in particular to support checkpointing and recovery for fault tolerance. It is also very useful in distributed environments like networks of workstations where external loads are often unpredictable and dynamic in nature. We propose simple language extensions (ELMO) to existing sequential programming languages like C, Fortran or C++, that provide a object-based task-parallel execution model. Tasks may be dynamically created, are location transparent, and may be migrated or checkpointed transparently by the system. In this paper, ELMO's language features together with the requisite compiler support is presented. 1 Introduction Networks of workstations present some important problems that need to be addressed before they can be used effectively as a parallel computer. (a) Since local-area networks are not explicitly designed for parallel computing, such systems exhibit high communication latencies wh...


Portable Checkpointing for Heterogeneous Architectures

July 1997

·

16 Reads

·

100 Citations

Current approaches for checkpointing assume system homogeneity, where checkpointing and recovery are both performed on the same processor architecture and operating system configuration. Sometimes it is desirable or necessary to recover a failed computation on a different processor architecture. For such situations checkpointing and recovery must be portable. In this paper, we argue that source-to-source compilation is an appropriate concept for this purpose. We describe the compilation techniques that we developed for the design of the c2ftc prototype. The c2fte compiler enables machine-independent checkpoints by automatic generation of checkpointing and recovery code. Sequential C programs are compiled into fault tolerant C programs, whose checkpoints can be migrated across heterogeneous networks, and restarted on binary incompatible architectures. Experimental results on several systems provide evidence that the performance penalty of portable checkpointing is negligible for realistic checkpointing frequencies


Portable Checkpointing for Heterogeneous Archtitectures

June 1997

·

10 Reads

·

33 Citations

Current approaches for checkpointing assume system homogeneity, where checkpointing and recovery are both performed on the same processor architecture and operating system configuration. Sometimes it is desirable or necessary to recover a failed computation on a different processor architecture. For such situations checkpointing and recovery must be portable. In this paper, we argue that source-to-source compilation is an appropriate concept for this purpose. We describe the compilation techniques that we developed for the design of the c2ftc prototype. The c2ftc compiler enables machine-independent checkpoints by automatic generation of checkpointing and recovery code. Sequential C programs are compiled into fault tolerant C programs, whose checkpoints can be migrated across heterogeneous networks, and restarted on binary incompatible architectures. Experimental results on several systems provide evidence that the performance penalty of portable checkpointing is negligible for realistic checkpointing frequencies.


Fig. 1. Future performance development.
Fig. 2. Probability of availability.
Fig. 3. Dependency of effective runtime on number of processors and number of checkpoints. The attributes at the poles give the time units between checkpointing.
Perspectives for High Performance Computing in Workstation Networks

June 1997

·

69 Reads

·

1 Citation

. Networks of workstations have become increasingly popular for high performance computing. However, in order to becomea real alternative for MPPs, reliability and efficiency issues must be tackled. In this paper, we identify the key challenges for very large workstation networks, and describe implementation techniques at system software level to overcome these problems. CROWN, a testbed for experimenting with these mechanisms is briefly discussed. 1 Introduction High Performance Computing (HPC) to date has been attempted via one of three primary approaches: Vector-supercomputers (e.g., Cray), Massively Parallel Processors (MPPs, such as the Intel Paragon), and Networks of Workstations (NOWs). Vector supercomputers are expensive and MPPs are not cost-effective and often considered difficult to program. NOWs suffer from high latencies and scheduling overhead that is endemic in time-shared environments. Figure 1 projects the growth of workstation performance over the next two decades an...


ProperTEST: A portable parallel test generator for sequential circuits

May 1997

·

13 Reads

·

4 Citations

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Parallel algorithms developed for CAD problems today suffer from two important drawbacks. First, they are machine specific, and tend to perform poorly on architectures other than the one for they were designed. Second, the quality of results degrades significantly during parallel execution. In this paper, we address these two problems for an important CAD application: test generation for sequential circuits, We have developed a new parallel test generator, ProperTEST, that is portable across a range of MIMD parallel architectures. This work is part of the ProperCAD project which aims to develop CAD algorithms that run unchanged on shared and nonshared memory machines. We present performance data for ProperTEST on ISCAS 89 sequential circuits on a Sequent Symmetry, an Intel i860 hypercube, an NCUBE/2 hypercube, a network of Sun workstations, and an Encore Multimax. Parallel processing can also be used to improve on the fault coverage possible on one processor in a given amount of time. This was not possible in earlier approaches due to search anomalies. Using ProperTEST, we provide results on ISCAS 89 benchmark programs demonstrating the improvements in fault coverage as the number of processors is increased


Perspectives on high performance network computing

April 1997

·

6 Reads

·

1 Citation

Future Generation Computer Systems

Networks of workstations have become increasingly popular for high performance computing. However, in order to become a real alternative for Massively Parallel Processors (MPPs), reliability and efficiency issues must be tackled. In this paper, we identify the key challenges for very large workstation networks, and describe implementation techniques at system software level to overcome these problems.


ProperSYN: A Portable Parallel Algorithm for Logic Synthesis

February 1997

·

12 Reads

·

2 Citations

Parallel processing is fast becoming an attractive solution to reduce the computation time of CAD applications. Much of the work in parallel algorithms for CAD reported to date, however, suffers from a major limitation. The parallel algorithms proposed for the CAD applications are designed with a specific underlying parallel architecture in mind. We have developed a portable parallel algorithm based on the Transduction method [1], called ProperSYN. The same algorithm runs on a variety of parallel machines. Experimental results on various parallel machines are presented. 1 Introduction Combinational logic synthesis deals with the optimization of logic to realize a specific combinational function [2], [1]. Logic synthesis for large circuits have tremendous computing times and memory requirements. Parallel processing offers an attractive solution to this problem, hence researchers have started to investigate parallel algorithms for logic synthesis and verification [3] [4] [5]. Much of t...


Citations (31)


... In this work, we explore the parallel capabilities of the spiking neural network simulator, STACS (Simulation Tool for Asynchronous Cortical Streams) [15], which seeks to address some of these challenges to scaling neural simulations. STACS was developed to be parallel from the ground up, building on top of the Charm++ parallel programming framework, which expresses a paradigm of asynchronous messagedriven parallel objects [18,19]. Here, STACS takes advantage of the multicast communication pattern supported by Charm++ to match the irregular communication workload of biological scale models. ...

Reference:

Scaling neural simulations in STACS
The Charm Parallel Programming Language and System
  • Citing Article

... The first phase, ProperCAD I, involved the use of the C-based Charm language and runtime system [18, 19]. As part of ProperCAD I, a suite of parallel applications was developed that address the most significant tasks in VLSI design automation including circuit extraction [20], test generation [21], and logic synthesis [22]. An earlier version of our placement tool, ProperPLACE, was also developed using ProperCAD I [23]. ...

Properext: a portable parallel algorithm for vlsi circuit extraction
  • Citing Article

... This is in fact what happens in a pure OR parallel system such as Aurora [10] or Muse [2], which build a " cactus stack " of bindings in a shared memory space; an interior node with n branches is the stack frame for a call with n successful matches. Kalé's Reduce-OR process model is an example of a process oriented system, similar to the AND/OR model, that can operate on all results simultaneously [8]. Kalé calls this form of AND parallelism " consumer instance parallelism " , reflecting the fact that one invocation of the continuation (the consumer) is made for each result sent. ...

The Reduce-OR process model for parallel logic programming on non-shared memory machines
  • Citing Article

... Functional languages have also supported parallel architecture for decades. Compilers for Prolog[34, 50, 23], Lisp[35], and other similar languages[9, 12, 20, 66] have been implemented. Main-stream languages such as C, C++, and Java, also have robust support for parallel hardware[26, 27, 3, 71, 17] Versions of Ada, which was heavily supported by federal funding, also have support for parallel architectures[14]. ...

A Chare kernel implementation of a parallel Prolog compiler
  • Citing Article
  • March 1990

ACM SIGPLAN Notices

... Several researchers have proposed migration techniques designed to work with strongly-typed programming languages, like Java [28] and Emerald [24]. Many more have proposed migration and/or CPR through the instrumentation of well-typed C code [4, 10, 11, 14, 19, 23, 25, 26]. The definition of well-typed code is different among these works, with some work supporting certain type-unsafe constructs. ...

Portable Checkpointing for Heterogeneous Archtitectures
  • Citing Article
  • June 1997

... Together, asynchronous messaging and object-based virtualization enable the dynamic overlap of communication and computation: a processor may overlap messaging latency with not just the sending object's succeeding computation, but also with useful computation of other objects on that processor. Previous work [7], [8], [9] has been done on top of CHARM++ to study the load balancing issues in state space search problems. Below, we describe some of the key features of the CHARM++ model that pertain to the search engine. ...

Prioritization in parallel symbolic computing
  • Citing Chapter
  • April 2006

... This optimizes communication but no core has an idea of the global situation and, as we argue in this paper, this leads to suboptimal task assignment and exploration. We also refer the reader to Shu and Kale [1989], Saletore and Kale [1990], Kalé et al. [1992], Sinha and Kalé [1993], Abu-Khzam et al. [2007], Sun et al. [2011], Weerapurage et al. [2011] for further ideas that have focused on how to explore tasks efficiently in parallel. These works all aim to describe search tree distribution paradigms that are applicable to any branch-and-bound or branching FPT algorithm, of which the framework of Abu-Khzam et al. [2015] is the latest, to our knowledge. ...

Prioritization in Parallel Symbolic Computing.
  • Citing Conference Paper
  • January 1992

... To our knowledge, the P C 3 system is the first system to provide all of these features. Previous work on portable checkpoint- ing [4, 14, 16] has focused on uniprocessor applications. Systems based on over-decomposition such as AMPI [6] provide either nontransparent support for portable checkpointing or use operating system checkpointing primitives that cannot be used on different archi- tectures. ...

Portable Checkpointing for Heterogenous Architectures.
  • Citing Conference Paper
  • January 1997

... We have run a series of experiments in granularity control using two existing parallel logic programming systems: ROLOG and &-Prolog. ROLOG is a pure logic programming system based on Kale's reduce-or model [11,18]; programs are annotated for parallelism by the user. &-Prolog is a parallel Prolog system based on strict-and non-strict independence [7] which uses a modified RAP-WAM abstract machine [6], and where annotations for parallelism can be automatic or userprovided. ...

Compiled Execution of the Reduce-OR Process Model on Multiprocessors.
  • Citing Conference Paper
  • January 1989

... Backtracking problem can be solved when AND-parallelism is combined with OR-parallelism because backtracking in such systems becomes unnecessary [34]. Various execution models and systems exploiting both AND-and OR-parallelism have been proposed in recent years [8,20,32]. Such models usually overcome some problems when only one form of parallelism is supported, and make the best use of the underlying parallel execution facilities. ...

Joining AND Parallel Solutions in AND/OR Parallel Systems.
  • Citing Conference Paper
  • January 1990