Fig 4 - uploaded by Julian Clark Cummings
Content may be subject to copyright.
Neutron tracks through a uranium sphere in MC++. 

Neutron tracks through a uranium sphere in MC++. 

Source publication
Article
Full-text available
The Parallel Object-Oriented Methods and Applications (POOMA) Framework, written in ANSI/ISO C++, has demonstrated both great expressiveness and efficient code performance for large-scale scientific applications on computing platforms ranging from workstations to massively parallel supercomputers. The POOMA Framework provides high-level abstraction...

Context in source publication

Context 1
... was benchmarked against the MCNP code, a comprehensive neutronics package written in Fortran 77 and parallelized using MPI. We ran the double-density Godiva test problem, which models a bare uranium sphere (see Fig. 4), using each code on a variety of workstations and the Cray T3D. We found that serial code performance was comparable and MC++ had superior parallel scaling properties on the T3DD9]. Because of its relative simplicity, MC++ was easily ported to new parallel architectures and quickly became the rst ASCI-relevant physics code to run successfully on all three ASCI computing platforms. This rapid success encouraged several groups of ASCI researchers working on hydrodynamics applications to begin utilizing the POOMA Framework for their code development as well. These researchers now form the Blanca code team, and at present, their POOMA-based hydrodynamics models have utilized over 1000 SGI R10000 processors in parallel and have run calculations on meshes containing as many as 60 million ...

Similar publications

Conference Paper
Full-text available
This paper presents an effort launched in 2006 by the OpenECG network, led by the Graz University of Technology and supported by IEEE 1073, ISO 11073 and CEN TC251 to create a two-way converter in C++ between the SCP-ECG and the HL7 aECG standards. In the conversion, GDF, the BioSig internal data format, was used as an intermediate structure. This...

Citations

... Processing Library (STAPL, [3]) and Parallel Vector Library (PVL, [4]), which orthogonalizes the tasks of writing an application and mapping it onto a parallel processor. Finally, it uses the object-oriented capabilities of C++ to achieve high performance while maintaining a high level of abstraction, similar to previous work done at Los Alamos National Laboratory [5], [6]. ...
... Expression templates allow operations involving multiple high-level objects to be combined at a low level by the compiler, eliminating the need for temporary storage and copies. An example of this is provided by the POOMA library developed at Los Alamos National Laboratory [5], [6]. April At the time of this writing, the VSIPL++ and Parallel VSIPL++ standards are under development by the HPEC-SI standards activity. ...
... Designating a data object as having one of these special case distributions allows the compiler to better optimize the data reference operations associated with the object. Similar designations were provided in POOMA [6]. ...
Article
Real-time signal processing consumes the majority of the world's computing power. Increasingly, programmable parallel processors are used to address a wide variety of signal processing applications (e.g., scientific, video, wireless, medical, communication, encoding, radar, sonar, and imaging). In programmable systems, the major challenge is no longer hardware but software. Specifically, the key technical hurdle lies in allowing the user to write programs at high level, while still achieving performance and preserving the portability of the code across parallel computing hardware platforms. The Parallel Vector, Signal, and Image Processing Library (Parallel VSIPL++) addresses this hurdle by providing high-level C++ array constructs, a simple mechanism for mapping data and functions onto parallel hardware, and a community-defined portable interface. This paper presents an overview of the Parallel VSIPL++ standard as well as a deeper description of the technical foundations and expected performance of the library. Parallel VSIPL++ supports adaptive optimization at many levels. The C++ arrays are designed to support automatic hardware specialization by the compiler. The computation objects (e.g., fast Fourier transforms) are built with explicit setup and run stages to allow for runtime optimization. Parallel arrays and functions in Parallel VSIPL++ also support explicit setup and run stages, which are used to accelerate communication operations. The parallel mapping mechanism provides an external interface that allows optimal mappings to be generated offline and read into the system at runtime. Finally, the standard has been developed in collaboration with high performance embedded computing vendors and is compatible with their proprietary approaches to achieving performance.
... Traditional programming environments emphasize either the coding of components (influenced by an implicit composition style) or the aspect of connecting them together (to prototype complex computations). For instance, when coding effort is paramount and composition is implemented in a distributed objects system (e.g., [22,39]), techniques such as inheritance and templates can be used to create new components. Other implementations involving parallel programming [12,19,31] or multi-agent coordination [25,26] provide comparable facilities (typically APIs) for creating new components. ...
Article
Rapid advances in technological infrastructure as well as the emphasis on application support systems have signaled the maturity of grid computing. Today's grid computing environments (GCEs) extend the notion of a programming environment beyond the compile-schedule-execute paradigm to include functionality such as networked access, information services, data management, and collaborative application composition. In this article, we present GCEs in the context of supporting multidisciplinary communities of scientists and engineers. We present a high-level design framework for building GCEs and a space of characteristics that help identify requirements for GCEs for multidisciplinary communities. By describing integrated systems for five different multidisciplinary communities, we outline the unique responsibility (and opportunity) for GCEs to exploit the larger context of the scientific or engineering application, defined by the ongoing activities of the pertinent community. Finally, we describe several core systems support technologies that we have developed to support multidisciplinary GCE applications. Contents 1
... There are also several higher-level parallel programming abstractions that use MPI, OpenMP, or POSIX threads, such as implementations of the Bulk-Synchronous Parallel (BSP) model [77, 43, 22] and data-parallel languages like High-Performance Fortran [42]. Higher-level application framework such as KeLP [29] and POOMA [27] also abstract away the details of the parallel communication layers. These frameworks enhance the expressiveness of data-parallel languages by providing the user with a high-level programming abstraction for block-structured scientific calculations. ...
Article
Full-text available
The emerging discipline of algorithm engineering has primarily focussed on transforming pencil-and-paper sequential algorithms into robust, efficient, well tested, and easily used implementations. As parallel computing becomes ubiquitous, we need to extend algorithm engineering techniques to parallel computation. Such an extension adds significant complications. After a short review of algorithm engineering achievements for sequential computing, we review the various complications caused by parallel computing, present some examples of successful efforts, and give a personal view of possible future research.
... Other major differences lie in the HPF-like features such as loop-parallelism, and the extended C++ language syntax. POOMA (Parallel Object-Oriented Methods and Applications) [8] is a collection of C++ template-classes for writing high performance scientific applications. It provides high-level data-parallel types (for instance: high-level abstractions for multi-dimensional arrays, computational mesh, etc) that make it easy to write parallel PDE (partial differential equation) solvers without worrying about the low-level details of layout, data transfer, and synchronization. ...
Article
The concept of design patterns has been extensively studied and applied in the context of object-oriented software design. Similar ideas are being explored in other areas of computing as well. Over the past several years, researchers have been experimenting with the feasibility of employing design-patterns related concepts in the parallel computing domain. In the past, several pattern-based systems have been developed with the intention to facilitate faster parallel application development through the use of pre-implemented and reusable components that are based on frequently used parallel computing design patterns. However, most of these systems face several serious limitations such as limited flexibility, zero extensibility, and ad hoc nature of their components. Lack of flexibility in a parallel programming system limits a programmer to using only the high-level components provided by the system. Lack of extensibility here refers to the fact that most of the existing pattern- based parallel programming systems come with a set of pre-built patterns integrated into the system. However, the system provides no obvious way of increasing the repertoire of patterns when need arises. Also, most of these systems do not offer any generic view of a parallel computing pattern, a fact which may be at the root of several of their shortcomings. This research proposes a generic (i.e., pattern- and application-independent) model for realizing and using parallel design patterns. The term "Parallel Architectural Skeleton" is used to represent the set of generic attributes associated with a pattern. The Parallel Architectural Skeleton Model (PASM) is based on the message-passing paradigm, which makes it suitable for a LAN of workstations and PCs. The model is flexible as it allows the intermixing of high- level patterns with low-level message-passing primitives. An object-oriented and library-based implementation of the model has been completed using C++ and MPI, without necessitating any language extension. The generic model and the library-based implementation allow new patterns to be defined and included into the system. The skeleton-library serves as a "framework" for the systematic, hierarchical development of network-oriented parallel applications.
... r)ũ ? rp + 4ũ (7) r ũ = 0 ...
Article
An object-oriented (OO) framework for partial differential equations (PDEs) provides software abstractions for numerical simulation of PDEs. The design of such frameworks is not trivial, and the outcome of the design is highly dependent on which mathematical abstractions one chooses to support. In this paper, coordinate free abstractions for PDEs are advocated. The coordinate free formulation of a PDE hides the underlying coordinate system. Therefore, software based on these concepts has the prospect of being more modular, since the PDE formulation is separated from the representation of the coordinates. Use of coordinate free methods in two independent OO frameworks are presented, in order to exemplify the viability of the concepts. The described applications simulate seismic waves for various rock models and the incompressible Navier--Stokes equations on curvi-linear grids, respectively. In both cases, the possibility to express the equations in a domain independent fashion is crucial. Similarities and differences between the two independent frameworks are discussed, and a number of coordinate free variation points are identified. Variation points are places where a framework is designed for modification. Their identification is of interest both for tentative users of a framework and its developers. 1
... The exact extent of the duplication or overlap depends on the data dependency patterns (stencils) of the numerical algorithms used, and may vary considerably among different applications. Whereas this problem has found sufficient coverage in the case of structured grids [4, 5] , the case of unstructured grids still lacked a general solution. Exploiting the common structure of discretization algorithm, we present a simple algebraic description of their stencils, which can be used to create the necessary overlap automatically and efficiently. ...
... There has been much research work devoted to frameworks for parallel computing in the context of structured grids, see e. g. [4, 5]. The situation is different for unstructured grids, where no comparable work is known to the author. ...
Article
Full-text available
The local data dependency pattern or stencil of a numerical algorithm is a structural property which is important for parallel computations. We present an algebraic notation for stencils on unstructured grids, derive some basic properties of stencils, and introduce two algorithms for constructing grid overlaps based on stencils. Finally, we show how these results lead to a more general and reusable approach to parallel PDE solution.
... POOMA [1] is an object-oriented framework for applications in computational science requiring high-performance execution. It is a library of C++ classes designed to represent common abstractions in these applications. ...
Article
Today’s high-performance computing environments, and the applications that must exploit them, have become much more complex than ever. We now build ensembles of large, shared-memory parallel computers, linked together with high-speed networks, in an attempt to achieve previously unheard-of speeds and still retain a ‘general-purpose’ capability for running diverse applications.Demands for greater precision and realism in today’s computer simulations of physical phenomena tax the imagination of the most aggressive system designers. Enhancing the accuracy of tomorrow’s simulations requires simultaneously accounting for more physical, chemical and biological components. Predictive simulation is essential for making informed, science-based decisions on questions of national importance, including stockpile stewardship, global climate change, wild-fires, earthquakes and epidemics. These applications are too massive and inter-related to be built, verified, tuned and maintained by conventional methods.Software teams in the Advanced Computing Laboratory (ACL) at Los Alamos National Laboratory, along with collaborators world-wide, are building an integrated software infrastructure for scientific simulation development. This paper describes the ACL projects now underway in object-oriented frameworks, scalable run-time software, scientific visualization, software component architecture, and high-end and experimental computer systems. We include achieved results and the status of projects.
Article
Exploiting symmetries are important in numerical mathematics, both with respect to efficient memory usage and with respect to symmetry exploiting algorithms. In this paper, the symmetries of tensors are in focus. A convenient notation for describing coordinate-free tensor symmetries is established, based on sets of permutations. Completely symmetric and antisymmetric tensors are included as special cases. The extensions to multidimensional arrays with other kinds of symmetries or invariant features are also treated.The symmetry information is used to represent tensors with symmetries more economically with respect to memory. In addition, three algorithms that exploit symmetries are presented. First, a Frobenius norm computation is derived. Second, a projection to an index space with general symmetries is shown, and proven to be optimal in the Frobenius norm. Third, a symmetry utilizing formula for a dual mapping between completely antisymmetric index spaces is shown.The implementation of symmetry support in EinSum is discussed. EinSum is a C++ package primarily intended for tensor algebra, capable of supporting the Einstein summation convention. Details on the symmetry part of the implementation are explained. Code for the implementation of the Frobenius norm, the general projection, and the dual mapping is shown, illustrating how symmetry aware software may decrease both the memory usage and the number of arithmetic operations.
Conference Paper
MATLABreg is one of the most commonly used languages for scientific computing with approximately one million users worldwide. At MIT Lincoln Laboratory, MATLAB is used by technical staff to develop sensor processing algorithms. MATLAB's popularity is based on availability of high-level abstractions leading to reduced code development time. Due to the compute intensive nature of scientific computing, these applications often require long running times and would benefit greatly from increased performance offered by parallel computing. pMatlab (www.ll.mit.edu/pMatlab) implements partitioned global address space (PGAS) support via standard operator overloading techniques. The core data structures in pMatlab are distributed arrays and maps, which simplify parallel programming by removing the need for explicit message passing. This paper presents the pMatlab design and results for the HPC Challenge benchmark suite. Additionally, two case studies of pMatlab use are described
Conference Paper
Full-text available
This is a status report of a long-term research effort focusing on object-oriented modeling of parallel PDE solvers, based on finite difference methods on composite, structured grids. Two previous results of this effort are reviewed, the class libraries Cogito and Compose. Cogito is implemented in Fortran 90, with MPI for the message passing, and provides abstract data types for parallel composite-grid methods. Compose is in C++ and allows for fully object-oriented construction of PDE solvers by composition of objects. The object model behind Compose is described, and some research issues related to the refinement of the model are outlined. Finally, some recent results are presented, which are initial steps in addressing these issues.