The compression ratio for a 2X APAX compression run is shown. The three curves repre- sent the maximum, minimum, and mean compression rate among the 1024 processes.

Source publication

Assessing the Effects of Data Compression in Simulations Using Physically Motivated Metrics

Conference Paper

Full-text available

Nov 2013

This paper examines whether lossy compression can be used effectively in physics simulations as a possible strategy to combat the expected data-movement bottleneck in future high performance computing architectures. We show that, for the codes and simulations we tested, compression levels of 3-5X can be applied without causing significant changes t...

Assessing the Effects of Data Compression in Simulations Using Physically Motivated Metrics

Article

Full-text available

Jan 2014

This paper examines whether lossy compression can be used effectively in physics simulations as a possible strategy to combat the expected data-movement bottleneck in future high performance computing architectures. We show that, for the codes and simulations we tested, compression levels of 3–5X can be applied without causing significant changes t...

Portability and Scalability of OpenMP Offloading on State-of-the-Art Accelerators

Chapter

Full-text available

Aug 2023

Over the last decade, most of the increase in computing power has been gained by advances in accelerated many-core architectures, mainly in the form of GPGPUs. While accelerators achieve phenomenal performances in various computing tasks, their utilization requires code adaptations and transformations. Thus, OpenMP, the most common standard for multi-threading in scientific computing applications, introduced offloading capabilities between host (CPUs) and accelerators since v4.0, with increasing support in the successive v4.5, v5.0, v5.1, and the latest v5.2 versions. Recently, two state-of-the-art GPUs – the Intel Ponte Vecchio Max 1100 and the NVIDIA A100 GPUs – were released to the market, with the oneAPI and NVHPC compilers for offloading, correspondingly. In this work, we present early performance results of OpenMP offloading capabilities to these devices while specifically analyzing the portability of advanced directives (using SOLLVE’s OMPVV test suite) and the scalability of the hardware in representative scientific mini-app (the LULESH benchmark). Our results show that the coverage for version 4.5 is nearly complete in both latest NVHPC and oneAPI tools. However, we observed a lack of support in versions 5.0, 5.1, and 5.2, which is particularly noticeable when using NVHPC. From the performance perspective, we found that the PVC1100 and A100 are relatively comparable on the LULESH benchmark. While the A100 is slightly better due to faster memory bandwidth, the PVC1100 reaches the next problem size (\(400^3\)) scalably due to the larger memory size.The results are available at: https://github.com/Scientific-Computing-Lab-NRCN/Accel-OpenMP-Portability-Scalability.

Inshimtu – A Lightweight In Situ Visualization “Shim”

Chapter

Aug 2023

In situ visualization and analysis is a valuable yet under utilized commodity for the simulation community. There is hesitance or even resistance to adopting new methodologies due to the uncertainties that in situ holds for new users. There is a perceived implementation cost, maintenance cost, risk to simulation fault tolerance, potential lack of scalability, a new resource cost for running in situ processes, and more. The list of reasons why in situ is overlooked is long. We are attempting to break down this barrier by introducing Inshimtu. Inshimtu is an in situ “shim” library that enables users to try in situ before they buy into a full implementation. It does this by working with existing simulation output files, requiring no changes to simulation code. The core visualization component of Inshimtu is ParaView Catalyst, allowing it to take advantage of both interactive and non-interactive visualization pipelines that scale. We envision Inshimtu as stepping stone to show users the value of in situ and motivate them to move to one of the many existing fully-featured in situ libraries available in the community. We demonstrate the functionality of Inshimtu with a scientific workflow on the Shaheen II supercomputer.Inshimtu is available for download at: https://github.com/kaust-vislab/Inshimtu-basic.

Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators

Preprint

Full-text available

Apr 2023

Over the last decade, most of the increase in computing power has been gained by advances in accelerated many-core architectures, mainly in the form of GPGPUs. While accelerators achieve phenomenal performances in various computing tasks, their utilization requires code adaptations and transformations. Thus, OpenMP, the most common standard for multi-threading in scientific computing applications, introduced offloading capabilities between host (CPUs) and accelerators since v4.0, with increasing support in the successive v4.5, v5.0, v5.1, and the latest v5.2 versions. Recently, two state-of-the-art GPUs - the Intel Ponte Vecchio Max 1100 and the NVIDIA A100 GPUs - were released to the market, with the oneAPI and GNU LLVM-backed compilation for offloading, correspondingly. In this work, we present early performance results of OpenMP offloading capabilities to these devices while specifically analyzing the potability of advanced directives (using SOLLVE's OMPVV test suite) and the scalability of the hardware in representative scientific mini-app (the LULESH benchmark). Our results show that the vast majority of the offloading directives in v4.5 and 5.0 are supported in the latest oneAPI and GNU compilers; however, the support in v5.1 and v5.2 is still lacking. From the performance perspective, we found that PVC is up to 37% better than the A100 on the LULESH benchmark, presenting better performance in computing and data movements.

ScalSALE: Scalable SALE Benchmark Framework for Supercomputers

Preprint

Full-text available

Sep 2022

Supercomputers worldwide provide the necessary infrastructure for groundbreaking research. However, most supercomputers are not designed equally due to different desired figure of merit, which is derived from the computational bounds of the targeted scientific applications' portfolio. In turn, the design of such computers becomes an optimization process that strives to achieve the best performances possible in a multi-parameters search space. Therefore, verifying and evaluating whether a supercomputer can achieve its desired goal becomes a tedious and complex task. For this purpose, many full, mini, proxy, and benchmark applications have been introduced in the attempt to represent scientific applications. Nevertheless, as these benchmarks are hard to expand, and most importantly, are over-simplified compared to scientific applications that tend to couple multiple scientific domains, they fail to represent the true scaling capabilities. We suggest a new physical scalable benchmark framework, namely ScalSALE, based on the well-known SALE scheme. ScalSALE's main goal is to provide a simple, flexible, scalable infrastructure that can be easily expanded to include multi-physical schemes while maintaining scalable and efficient execution times. By expanding ScalSALE, the gap between the over-simplified benchmarks and scientific applications can be bridged. To achieve this goal, ScalSALE is implemented in Modern Fortran with simple OOP design patterns and supported by transparent MPI-3 blocking and non-blocking communication that allows such a scalable framework. ScalSALE is compared to LULESH via simulating the Sedov-Taylor blast wave problem using strong and weak scaling tests. ScalSALE is executed and evaluated with both rezoning options - Lagrangian and Eulerian.

AMM: Adaptive Multilinear Meshes

Article

Full-text available

Apr 2022
IEEE T VIS COMPUT GR

Adaptive representations are increasingly indispensable for reducing the in-memory and on-disk footprints of large-scale data. Usual solutions are designed broadly along two themes: reducing data precision, e.g. , through compression, or adapting data resolution, e.g. , using spatial hierarchies. Recent research suggests that combining the two approaches, i.e. , adapting both resolution and precision simultaneously, can offer significant gains over using them individually. However, there currently exist no practical solutions to creating and evaluating such representations at scale. In this work, we present a new resolution-precision-adaptive representation to support hybrid data reduction schemes and offer an interface to existing tools and algorithms. Through novelties in spatial hierarchy, our representation, Adaptive Multilinear Meshes (AMM), provides considerable reduction in the mesh size. AMM creates a piecewise multilinear representation of uniformly sampled scalar data and can selectively relax or enforce constraints on conformity, continuity, and coverage, delivering a flexible adaptive representation. AMM also supports representing the function using mixed-precision values to further the achievable gains in data reduction. We describe a practical approach to creating AMM incrementally using arbitrary orderings of data and demonstrate AMM on six types of resolution and precision datastreams. By interfacing with state-of-the-art rendering tools through VTK, we demonstrate the practical and computational advantages of our representation for visualization techniques. With an open-source release of our tool to create AMM, we make such evaluation of data reduction accessible to the community, which we hope will foster new opportunities and future data reduction schemes.

WaveRange: wavelet-based data compression for three-dimensional numerical simulations on regular grids

Article

Full-text available

Jan 2022
J Visual

A wavelet-based method for compression of three-dimensional simulation data is presented and its software framework is described. It uses wavelet decomposition and subsequent range coding with quantization suitable for floating-point data. The effectiveness of this method is demonstrated by applying it to example numerical tests, ranging from idealized configurations to realistic global-scale simulations. The novelty of this study is in its focus on assessing the impact of compression on post-processing and restart of numerical simulations. Graphical abstract

Stability Analysis of Inline ZFP Compression for Floating-Point Data in Iterative Methods

Article

Sep 2020
SIAM J SCI COMPUT

Currently, the dominating constraint in many high performance computing applications is data capacity and bandwidth, in both internode communications and even moreso in intranode data motion. A new approach to address this limitation is to make use of data compression in the form of a compressed data array. Storing data in a compressed data array and converting to standard IEEE-754 types as needed during a computation can reduce the pressure on bandwidth and storage. However, repeated conversions (lossy compression and decompression) introduce additional approximation errors, which need to be shown to not significantly affect the simulation results. We extend recent work [J. Diffenderfer et al., SIAM J. Sci. Comput., 41 (2019), pp. A1867-A1898] that analyzed the error of a single use of compression and decompression of the ZFP compressed data array representation [P. Lindstrom, IEEE Trans. Vis. Comput. Graph., 20 (2014), pp. 2674-2683; P. Lindstrom, ZFP version 0.5.5, May 2019] to the case of time-stepping and iterative schemes, where an advancement operator is repeatedly applied in addition to the conversions. We show that the accumulated error for iterative methods involving fixed-point and time evolving iterations is bounded under standard constraints. An upper bound is established on the number of additional iterations required for the convergence of stationary fixed-point iterations. An additional analysis of traditional forward and backward error of stationary iterative methods using ZFP compressed arrays is also presented. The results of several 1D, 2D, and 3D test problems are provided to demonstrate the correctness of the theoretical bounds.

AMM: Adaptive Multilinear Meshes

Preprint

Full-text available

Jul 2020

We present Adaptive Multilinear Meshes (AMM), a new framework that significantly reduces the memory footprint compared to existing data structures. AMM uses a hierarchy of cuboidal cells to create continuous, piecewise multilinear representation of uniformly sampled data. Furthermore, AMM can selectively relax or enforce constraints on conformity, continuity, and coverage, creating a highly adaptive and flexible representation to support a wide range of use cases. AMM supports incremental updates in both spatial resolution and numerical precision establishing the first practical data structure that can seamlessly explore the tradeoff between resolution and precision. We use tensor products of linear B-spline wavelets to create an adaptive representation and illustrate the advantages of our framework. AMM provides a simple interface for evaluating the function defined on the adaptive mesh, efficiently traversing the mesh, and manipulating the mesh, including incremental, partial updates. Our framework is easy to adopt for standard visualization and analysis tasks. As an example, we provide a VTK interface, through efficient on-demand conversion, which can be used directly by corresponding tools, such as VisIt, disseminating the advantages of faster processing and a smaller memory footprint to a wider audience. We demonstrate the advantages of our approach for simplifying scalar-valued data for commonly used visualization and analysis tasks using incremental construction, according to mixed resolution and precision data streams.

HCompress: Hierarchical Data Compression for Multi-Tiered Storage Environments

Conference Paper

Full-text available

May 2020

Stability Analysis of Inline ZFP Compression for Floating-Point Data in Iterative Methods

Preprint

Full-text available

Mar 2020

Currently, the dominating constraint in many high performance computing applications is data capacity and bandwidth, in both inter-node communications and even more-so in on-node data motion. A new approach to address this limitation is to make use of data compression in the form of a compressed data array. Storing data in a compressed data array and converting to standard IEEE-754 types as needed during a computation can reduce the pressure on bandwidth and storage. However, repeated conversions (lossy compression and decompression) introduce additional approximation errors, which need to be shown to not significantly affect the simulation results. We extend recent work [J. Diffenderfer, et al., Error Analysis of ZFP Compression for Floating-Point Data, SIAM Journal on Scientific Computing, 2019] that analyzed the error of a single use of compression and decompression of the ZFP compressed data array representation [P. Lindstrom, Fixed-rate compressed floating-point arrays, IEEE Transactions on Visualization and Computer Graphics, 2014] to the case of time-stepping and iterative schemes, where an advancement operator is repeatedly applied in addition to the conversions. We show that the accumulated error for iterative methods involving fixed-point and time evolving iterations is bounded under standard constraints. An upper bound is established on the number of additional iterations required for the convergence of stationary fixed-point iterations. An additional analysis of traditional forward and backward error of stationary iterative methods using ZFP compressed arrays is also presented. The results of several 1D, 2D, and 3D test problems are provided to demonstrate the correctness of the theoretical bounds.

The compression ratio for a 2X APAX compression run is shown. The three curves repre- sent the maximum, minimum, and mean compression rate among the 1024 processes.

Similar publications

Citations