Book

Encyclopedia of Parallel Computing

January 2011

January 2011

DOI:10.1007/978-0-387-09766-4

ISBN: 978-0-387-09765-7

Authors:

David Padua

University of Illinois, Urbana-Champaign

Containing over 300 entries in an A-Z format, the Encyclopedia of Parallel Computing provides easy, intuitive access to relevant information for professionals and researchersseeking access to any aspect within the broad field of parallel computing. Topics for this comprehensive reference were selected, written, and peer-reviewed by an international pool of distinguished researchers in the field. The Encyclopedia is broad in scope, covering machine organization, programming languages, algorithms, and applications. Within each area, concepts, designs, and specific implementations are presented. The highly-structured essays in this work comprise synonyms, a definition and discussion of the topic, bibliographies, and links to related literature. Extensive cross-references to other entries within the Encyclopedia support efficient, user-friendly searchers for immediate access to useful information. Key concepts presented in the Encyclopedia of Parallel Computing include; laws and metrics; specific numerical and non-numerical algorithms; asynchronous algorithms; libraries of subroutines; benchmark suites; applications; sequential consistency and cache coherency; machine classes such as clusters, shared-memory multiprocessors, special-purpose machines and dataflow machines; specific machines such as Cray supercomputers, IBMs cell processor and Intels multicore machines; race detection and auto parallelization; parallel programming languages, synchronization primitives, collective operations, message passing libraries, checkpointing, and operating systems. Topics covered: Speedup, Efficiency, Isoefficiency, Redundancy, Amdahls law, Computer Architecture Concepts, Parallel Machine Designs, Benmarks, Parallel Programming concepts & design, Algorithms, Parallel applications. This authoritative reference will be published in two formats: print and online. The online edition features hyperlinks to cross-references and to additional significant research. Related Subjects: supercomputing, high-performance computing, distributed computing

Reconfigurable Computers

Chapter

Jan 2011
Encyclopedia of Parallel Computing

HPF (High Performance Fortran)

Chapter

Jan 2011
Encyclopedia of Parallel Computing

UPC

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Quadrics

Chapter

Jan 2011
Encyclopedia of Parallel Computing
pp.1677-1689

Atomic Operations

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Connected Components Algorithm

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Half Vector Length

Chapter

Jan 2011
Encyclopedia of Parallel Computing

OS, Light-Weight

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Microprocessors

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Quicksort

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Radix Sort

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Relaxed Memory Consistency Models

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Speculation

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Speedup

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Vectorization

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Work-Depth Model

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Communicating Sequential Processes (CSP)

Chapter

Jan 2011
Encyclopedia of Parallel Computing

HEP, Denelcor

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Java

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Haskell

Chapter

Jan 2011
Encyclopedia of Parallel Computing

High Performance Fortran (HPF)

Chapter

Jan 2011
Encyclopedia of Parallel Computing

TBB (Intel Threading Building Blocks)

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Verification of Parallel Shared-Memory Programs, Owicki-Gries Method of Axiomatic

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Tasks

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Threads

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Quantum Chromodynamics (QCD) Computations

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Scheduling

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Task Mapping, Topology Aware

Chapter

Jan 2011
Encyclopedia of Parallel Computing

COW

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Copy

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Optimistic Loop Parallelization

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Thread Level Speculation (TLS) Parallelization

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Race

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Workflow Scheduling

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Hierarchical Data Format

Chapter

Jan 2011
Encyclopedia of Parallel Computing

CML

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Whole Program Analysis

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Tuning and Analysis Utilities

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Vampir 7

Chapter

Jan 2011
Encyclopedia of Parallel Computing

VampirServer

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Compiler Optimizations for Array Languages

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Execution Ordering

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Automated Tuning

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Software Autotuning

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Weak Scaling

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Sparse Gaussian Elimination

Chapter

Jan 2011
Encyclopedia of Parallel Computing

System Integration

Chapter

Jan 2011
Encyclopedia of Parallel Computing

VLSI Algorithmics

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Overdetermined Systems

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Underdetermined Systems

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Speculative Multithreading (SM)

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Speculative Parallelization

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Speculative Threading

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Thread-Level Data Speculation (TLDS)

Chapter

Jan 2011
Encyclopedia of Parallel Computing

TLS

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Hazard (in Hardware)

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Tensilica

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Runtime System

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Cray MTA

Chapter

Jan 2011
Encyclopedia of Parallel Computing

The High Performance Substrate

Chapter

Jan 2011
Encyclopedia of Parallel Computing

High-Level I/O Library

Chapter

Jan 2011
Encyclopedia of Parallel Computing

TStreams

Chapter

Jan 2011
Encyclopedia of Parallel Computing

State Space Search

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Omega Calculator

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Omega Library

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Omega Project

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Use-Def Chains

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Consistent Hashing

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Open Distributed Systems

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Overlay Network

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Out-of-Order Execution Processors

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Stalemate

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Coordination

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Exaop Computing

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Router Architecture

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Network Offload

Chapter

Jan 2011
Encyclopedia of Parallel Computing

HT3.10

Chapter

Jan 2011
Encyclopedia of Parallel Computing

3GIO

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Congestion Control

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Multistage Interconnection Networks

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Hypercube

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Ring

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Community Atmosphere Model (CAM)

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Community Climate System Model (CCSM)

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Terrestrial Ecosystem Modeling

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Uncertainty Quantification

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Quantum Chemistry

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Reconstruction of Evolutionary Trees

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Wavefront Arrays

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Thin Ethernet

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Eventual Values

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Multiprocessor Networks

Chapter

Jan 2011
Encyclopedia of Parallel Computing

LANai

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Hyperplane Partitioning

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Complex Event Processing

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Event Stream Processing

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Collect

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Concatenation

Chapter

Jan 2011
Encyclopedia of Parallel Computing

Hierarchical Semi-Sparse Cubes—Parallel Framework for Storing Multi-Modal Big Data in HDF5

Article

Full-text available

Jan 2023

Since Moore‘s law applies also to data detectors, the volume of data collected in astronomy doubles approximately every year. A prime example is the upcoming Square Kilometer Array (SKA) instrument that will produce approximately 8.5 Exabytes over the first 15 years of service, starting in the year 2027. Storage capacities for these data have grown as well, and primary analytical tools have also kept up. However, the tools for combining big data from several such instruments still lag behind. Having the ability to easily combine big data is crucial for inferring new knowledge about the universe from the correlations and not only finding interesting information in these huge datasets but also their combinations. In this article, we present a revised version of the Hierarchical Semi-Sparse Cube (HiSS-Cube) framework. It aims to provide highly parallel processing of combined multi-modal multi-dimensional big data. The main contributions of this study are as follows: 1) Highly parallel construction of a database built on top of the HDF5 framework. This database supports parallel queries. 2) Design of a database index on top of HDF5 that can be easily constructed in parallel. 3) Support of efficient multi-modal big data combinations. We tested the scalability and efficiency on big astronomical spectroscopic and photometric data obtained from the Sloan Digital Sky Survey. The performance of HiSS-Cube is bounded by the I/O bandwidth and I/O operations per second of the underlying parallel file system, and it scales linearly with the number of I/O nodes.

Simuladores de redes de interconexión: un estudio comparativo y evaluación. (Interconnection Network Simulators: A comparative study and evaluation)

Research

Full-text available

Jul 2022

Interconnection network simulators are nowadays critical for the analysis and evaluation of networks used in high performance computing (HPC) systems. This dissertation focuses on network simulation tools from the perspective of the router architecture. Among the multiple domains of use of interconnection networks, this work focuses on system area networks (SANs), which interconnect multiple computing nodes in HPC and data centres systems. As part of this project, a state-of-the-art analysis of these simulation tools is presented. Three of these simulators, CAMINOS, BookSim and SuperSim, are then selected for further study. It should be highlighted that CAMINOS is implemented in the Rust language, while the other two use C++. A comparative study has been conducted on the selected tools, according to the software modularity, their configuration syntax, or the behaviour of the simulations. In addition to the comparative study, an evaluation of these three simulators has been performed through a series of metrics. These metrics cover both the functionality and the performance of these tools. The functional evaluation corroborates that the simulation results, such as accepted load or network latency, are similar between the simulators. The performance part is related to the resource usage, such as memory or runtime, used by these three tools. During comparison and evaluation, certain shortcomings have been detected regarding the modularity of the router modelled in CAMINOS. These shortcomings have led to the design and the partial implementation of a new router model, as well as the development of three new allocators. These proposals have been empirically validated. As main conclusions, it has been found that CAMINOS achieves functional results and execution times similar to those of BookSim, and a reduction of memory consumption of more than half.

Influence of Non-Newtonian Viscosity on Flow Structures and Wall Deformation in Compliant Serpentine Microchannels: A Numerical Study

Article

Full-text available

Aug 2023

The viscosity of fluid plays a major role in the flow dynamics of microchannels. Viscous drag and shear forces are the primary tractions for microfluidic fluid flow. Capillary blood vessels with a few microns diameter are impacted by the rheology of blood flowing through their conduits. Hence, regenerated capillaries should be able to withstand such impacts. Consequently, there is a need to understand the flow physics of culture media through the lumen of the substrate as it is one of the vital promoting factors for vasculogenesis under optimal shear conditions at the endothelial lining of the regenerated vessel. Simultaneously, considering the diffusive role of capillaries for ion exchange with the surrounding tissue, capillaries have been found to reorient themselves in serpentine form for modulating the flow conditions while developing sustainable shear stress. In the current study, S-shaped (S1) and delta-shaped (S2) serpentine models of capillaries were considered to evaluate the shear stress distribution and the oscillatory shear index (OSI) and relative residual time (RRT) of the derivatives throughout the channel (due to the phenomena of near-wall stress fluctuation), along with the influence of culture media rheology on wall stress parameters. The non-Newtonian power-law formulation was implemented for defining rheological viscosity of the culture media. The flow actuation of the media was considered to be sinusoidal and physiological, realizing the pulsatile blood flow behavior in the circulatory network. A distinct difference in shear stress distributions was observed in both the serpentine models. The S1 model showed higher change in shear stress in comparison to the S2 model. Furthermore, the non-Newtonian viscosity formulation was found to produce more sustainable shear stress near the serpentine walls compared to the Newtonian formulation fluid, emphasizing the influence of rheology on stress generation. Further, cell viability improved in the bending regions of serpentine channels compared to the long run section of the same channel.

Une nouvelle approche basée sur l’apprentissage profond pour l’optimisation automatique du code dans le compilateur Tiramisu

Thesis

Full-text available

Jul 2023

Mohamed Islem Kara Bernou

Les techniques d'optimisation automatique de code permettent d'améliorer les performances des programmes notamment le temps d’exécution. Ces techniques visent à transformer les programmes pour exploiter plus efficacement le matériel utilisé, en explorant l'espace des optimisations possibles pour choisir les plus efficaces. Les implémentations efficaces de ces techniques utilisent généralement des modèles de coût basés sur l'apprentissage automatique/profond afin d'évaluer l'effet des optimisations explorées. Dans ce travail, nous proposons un modèle de coût basé sur l'apprentissage profond qui vise à prédire l'accélération obtenue suite à l'application d'une séquence de transformations de code de manière plus précise par rapport à l'approche actuelle utilisée dans le compilateur Tiramisu. Ce nouveau modèle a l'avantage de supporter une plus large gamme de programmes, ce qui permet de meilleures optimisations et de meilleures accélérations obtenues pour les programmes du monde réel. Le modèle de coût proposé atteint une erreur absolue moyenne en pourcentage de 19.95% pour prédire les accélérations des programmes optimisés.

L’apprentissage automatique pour l’optimisation automatique du code - État de l’art et perspectives

Thesis

Full-text available

Apr 2023

Mohamed Islem Kara Bernou

Les techniques d’optimisation automatique de code permettant d’améliorer les performances des programmes notamment le temps d’exécution. Ces techniques visent à transformer les programmes pour exploiter plus efficacement le matériel utilisé, en explorant l’espace des optimisations possibles pour choisir les plus efficaces. Les implémentations efficaces de ces techniques utilisent généralement des modèles de coût basés sur l’apprentissage automatique/profond afin d’évaluer si l’application d’une séquence de transformations de code réduit le temps d’exécution du programme. La conception de tels modèles nécessite une réflexion sur plusieurs aspects, notant la représentation de l’entrée du modèle, son architecture et même la fonction de perte utilisée pour l’entraîner. Tous ces aspects jouent un rôle crucial dans la performance globale de l’optimisation automatique de ces compilateurs. L’objectif de ce mémoire de Master est de faire une étude sur l’utilisation de l’apprentissage profond dans l’optimisation automatique de code, couvrant les différents aspects du domaine.

Formalisms for Enterprise Application Integration (EAI): A Survey of Methodologies

Article

Full-text available

Mar 2023

Over the years, the number of applications supporting enterprise business processes has increased. The challenge of integrating diverse systems is one of the many reasons why many organizations fail to achieve greater automation. To overcome this obstacle, they are turning to Enterprise Application Integration (EAI). Enterprise Application Integration is a process that enables the integration of different applications. This allows the users to easily modify the functionality, share the information among the various applications and reuse the methods. The paper presents a formal method that includes the various levels of EAI. It highlights the various formal methods that can be used to achieve EAIs seamless interoperation. It also supports the concurrent and dynamic system. This paper also proposes a new architecture for EAI that will help them achieve their goals. There are many formal methods for programming languages in software engineering, but most of them are not adequate for the development of complex systems. The author proposes a new methodology based on Petri net which is a graphical representation of semantics.

Formalisms for Enterprise Application Integration (EAI): A Survey of Methodologies

Article

Mar 2023

Over the years, the number of applications supporting enterprise business processes has increased. The challenge of integrating diverse systems is one of the many reasons why many organizations fail to achieve greater automation. To overcome this obstacle, they are turning to Enterprise Application Integration (EAI). Enterprise Application Integration is a process that enables the integration of different applications. This allows the users to easily modify the functionality, share the information among the various applications and reuse the methods. The paper presents a formal method that includes the various levels of EAI. It highlights the various formal methods that can be used to achieve EAI's seamless interoperation. It also supports concurrent and dynamic systems. This paper also proposes a new architecture for EAI that will help them achieve their goals. There are many formal methods for programming languages in software engineering, but most of them are not adequate for the development of complex systems. The author proposes a new methodology based on Petri net which is a graphical representation of semantics.

High-performance hard-input LDPC decoding on multi-core devices for optical space links

Article

Feb 2023
J SYST ARCHITECT

Impact of HPC and Automated CFD Simulation Processes on Virtual Product Development : A Case Study

Article

Full-text available

Jan 2021

High-performance computing (HPC) enables both academia and industry to accelerate simulation-driven product development processes by providing a massively parallel computing infrastructure. In particular, the automation of high-fidelity computational fluid dynamics (CFD) analyses aided by HPC systems can be beneficial since computing time decreases while the number of significant design iterations increases. However, no studies have quantified these effects from a product development point of view yet. This article evaluates the impact of HPC and automation on product development by studying a formula student racing team as a representative example of a small or medium-sized company. Over several seasons, we accompanied the team, and provided HPC infrastructure and methods to automate their CFD simulation processes. By comparing the team’s key performance indicators (KPIs) before and after the HPC implementation, we were able to quantify a significant increase in development efficiency in both qualitative and quantitative aspects. The major aerodynamic KPI increased up to 115%. Simultaneously, the number of expedient design iterations within one season increased by 600% while utilizing HPC. These results prove the substantial benefits of HPC and automation of numerical-intensive simulation processes for product development.

Artificial Intelligence, Big Data Analytics, and Smart Cities

Chapter

Full-text available

Jan 2022

Modern urban life is seeing an increasing rate of adoption of artificial intelligence and smart solutions; however, citizens are still struggling to keep up the pace, and the rate at which they acquire skills and knowledge around artificial intelligence and data analysis in smart cities is lagging behind. This paper is an attempt to determine which digital skills are necessary when dealing with smart cities. This article is structured as follows: we first refer to the two basic and fundamental branches of artificial intelligence and continue with applications that exist in these branches regarding smart environments. The research contribution of this article is important since it is one of the few in the international literature dealing with all branches of AI and big data (e.g., machine learning and rule-based applications) in smart cities. The conclusion of the present work is that there is an urgent need to create an education system in the new concepts of AI and big data analysis not only for scientists but also for citizens.KeywordsArtificial intelligenceBig dataSmart citiesDigital skills

Impact of HPC and Automated CFD Simulation Processes on Virtual Product Development—A Case Study

Article

Full-text available

Jul 2021

Distributed Computing and Programming

Chapter

Feb 2024

We investigate distributed programming in C++ and other asynchronous many-task runtime systems. We also discuss data distribution, distributed I/O, and serialization which are additional things to consider for distributed applications. Lastly, we implement the fractal set using MPI and OpenMP.

Mathematics of a Process Algebra Inspired by Whitehead’s Process and Reality: A Review

Article

Full-text available

Jun 2024

William Sulis

Process algebras have been developed within computer science and engineering to address complicated computational and manufacturing problems. The process algebra described herein was inspired by the Process Theory of Whitehead and the theory of combinatorial games, and it was developed to explicitly address issues particular to organisms, which exhibit generativity, becoming, emergence, transience, openness, contextuality, locality, and non-Kolmogorov probability as fundamental characteristics. These features are expressed by neurobehavioural regulatory systems, collective intelligence systems (social insect colonies), and quantum systems as well. The process algebra has been utilized to provide an ontological model of non-relativistic quantum mechanics with locally causal information flow. This paper provides a pedagical review of the mathematics of the process algebra.

Lightweight Hardware-Based Cache Side-Channel Attack Detection for Edge Devices (Edge-CaSCADe)

Article

May 2024

Cache Side Channel Attacks (CSCA) have been haunting most processor architectures for decades now. Existing approaches to mitigation of such attacks have certain drawbacks namely software mishandling, performance overhead, low throughput due to false alarms, etc. Hence, “mitigation only when detected” should be the approach to minimize the effects of such drawbacks. We propose a novel methodology of fine-grained detection of timing-based CSCA using a hardware-based detection module. We discuss the design, implementation, and use of our proposed detection module in processor architectures. Our approach successfully detects attacks that flush secret victim information from cache memory like Flush+Reload, Flush+Flush, Prime+Probe, Evict+Probe, and Prime+Abort, commonly known as cache timing attacks. Detection is on time with minimal performance overhead. The parameterizable number of counters used in our module allows detection of multiple attacks on multiple sensitive locations simultaneously. The fine-grained nature ensures negligible false alarms, severely reducing the need for any unnecessary mitigation. The proposed work is evaluated by synthesizing the entire detection algorithm as an attack detection block, Edge-CaSCADe, in a RISC-V processor as a target example. The detection results are checked under different workload conditions with respect to the number of attackers, the number of victims having RSA,AES and ECC based encryption schemes like ECIES, and on benchmark applications like MiBench and Embench. More than $98\% $ detection accuracy within $2\% $ of the beginning of an attack can be achieved with negligible false alarms. The detection module has an area and power overhead of $0.9\% $ to $2\% $ and $1\% $ to $2.1\% $ for the targeted RISC-V processor core without cache for 1 to 5 counters, respectively. The detection module does not affect the processor critical path and hence has no impact on its maximum operating frequency.

An Application-Driven Method for Assembling Numerical Schemes for the Solution of Complex Multiphysics Problems

Article

Full-text available

Apr 2024

Within recent years, considerable progress has been made regarding high-performance solvers for partial differential equations (PDEs), yielding potential gains in efficiency compared to industry standard tools. However, the latter largely remains the status quo for scientists and engineers focusing on applying simulation tools to specific problems in practice. We attribute this growing technical gap to the increasing complexity and knowledge required to pick and assemble state-of-the-art methods. Thus, with this work, we initiate an effort to build a common taxonomy for the most popular grid-based approximation schemes to draw comparisons regarding accuracy and computational efficiency. We then build upon this foundation and introduce a method to systematically guide an application expert through classifying a given PDE problem setting and identifying a suitable numerical scheme. Great care is taken to ensure that making a choice this way is unambiguous, i.e., the goal is to obtain a clear and reproducible recommendation. Our method not only helps to identify and assemble suitable schemes but enables the unique combination of multiple methods on a per-field basis. We demonstrate this process and its effectiveness using different model problems, each comparing the resulting numerical scheme from our method with the next best choice. For both the Allen-Cahn and advection equations, we show that substantial computational gains can be attained for the recommended numerical methods regarding accuracy and efficiency. Lastly, we outline how one can systematically analyze and classify a coupled multiphysics problem of considerable complexity with six different unknown quantities, yielding an efficient, mixed discretization that in configuration compares well to high-performance implementations from the literature.

Numerical Analysis of Scaffold Degradation in cryogenic environment: Impact of Cell Migration and Cell Apoptosis

Article

Full-text available

Mar 2024

The analysis of degradation in the presence of cell death and migration is a critical aspect for biological fields. In present numerical study of degradation of scaffold were performed in present of cells, apoptosis and migration. The parameters; temperature, stress, strain tensor and deformation gradient associated with the degradation of polyelectrolyte complex scaffold were evaluated. Result shows that in both geometries minimum temperature had been achieved as 230.051 K at point P4 in series view and parallel view and at a point P3 for cell migration study for -5 and -1 k/min, respectively. The maximum stress had been generated for 5.57 x10^7 N/m2 for the temperature gradient of -2 K/min at T cycle in the case of cell migration study. In contrast in series view the maximum stress 2.9 x 107 N/m2 were observed at P4 which was higher as compare to P3. Similarly, for a parallel view, maximum stress (3.93 x 107 N/m2) was obtained for point P3. It had been observed that the maximum strain tensor 5.21 x 10^-3, 5.15 x 10^-3 and 5.26 x 10^-3 was generated in series view at 230 k on a point P3 for - 1, -2 and -5 K/min, respectively. Similarly, the maximum strain tensor 8.16 x 10^-3, 8.09 x 10^-3 and 8.09 x 10^-3 was generated in parallel view at 230 k on a point P3 for -1, -2 and -5 K/min, respectively. In the presence of cells, at a point P4 for temperature gradient of -1 and -2 K/min, it had been closed to the scaffold wall, which had a different temperature profile than the point P3 and scaffold comes to the contact with the cells. The analysis of PEC scaffold degradation offers significant insights into the relationship between scaffold properties, cell behaviour, and tissue regeneration.

Inversion for 3D Conductivity and Chargeability Models Using EM Data Acquired by the New Airborne TargetEM System in Ontario, Canada

Article

Full-text available

Feb 2024

This paper introduces an original approach to the joint inversion of airborne electromagnetic (EM) data for three-dimensional (3D) conductivity and chargeability models using hybrid finite difference (FD) and integral equation (IE) methods. The inversion produces a 3D model of physical parameters, which includes conductivity, chargeability, time constant, and relaxation coefficients. We present the underlying principles of this approach and an example of a high-resolution inversion of the data acquired by a new active time domain airborne EM system, TargetEM, in Ontario, Canada. The new TargetEM system collects high-quality multicomponent data with low noise, high power, and a small transmitter–receiver offset. This airborne system and the developed advanced inversion methodology represent a new effective method for mineral resource exploration.

A Computational Evaluation of Minimum Feature Size in Projection Two-Photon Lithography for Rapid Sub-100 nm Additive Manufacturing

Article

Full-text available

Jan 2024

Two-photon lithography (TPL) is a laser-based additive manufacturing technique that enables the printing of arbitrarily complex cm-scale polymeric 3D structures with sub-micron features. Although various approaches have been investigated to enable the printing of fine features in TPL, it is still challenging to achieve rapid sub-100 nm 3D printing. A key limitation is that the physical phenomena that govern the theoretical and practical limits of the minimum feature size are not well known. Here, we investigate these limits in the projection TPL (P-PTL) process, which is a high-throughput variant of TPL, wherein entire 2D layers are printed at once. We quantify the effects of the projected feature size, optical power, exposure time, and photoinitiator concentration on the printed feature size through finite element modeling of photopolymerization. Simulations are performed rapidly over a vast parameter set exceeding 10,000 combinations through a dynamic programming scheme, which is implemented on high-performance computing resources. We demonstrate that there is no physics-based limit to the minimum feature sizes achievable with a precise and well-calibrated P-TPL system, despite the discrete nature of illumination. However, the practically achievable minimum feature size is limited by the increased sensitivity of the degree of polymer conversion to the processing parameters in the sub-100 nm regime. The insights generated here can serve as a roadmap towards fast, precise, and predictable sub-100 nm 3D printing.

Architectural transformations in distributed telecommunications service systems and problems of ensuring information security

Article

Dec 2023

The article discusses the architectural transformations of distributed tele-communications service systems and methods of optimizing their reliability and efficiency. Modern distributed service-oriented networks are presented as complex heterogeneous systems, most of which are currently based on so-called cloud technologies. Cloud service systems were analyzed as an alternative to business customers purchasing their own powerful computing systems, software, and storage technologies. The principle of sharing these resources based on their virtualization was proposed. The main problems and ways of ensuring safety of information in these systems are provided.

Parallelized Inter-Image k-Means Clustering Algorithm for Unsupervised Classification of Series of Satellite Images

Article

Full-text available

Dec 2023

As the volume of satellite images increases rapidly, unsupervised classification can be utilized to swiftly investigate land cover distributions without prior knowledge and to generate training data for supervised (or deep learning-based) classification. In this study, an inter-image k-means clustering algorithm (IIkMC), as an improvement of the native k-means clustering algorithm (kMC), was introduced to obtain a single set of class signatures so that the classification results could be compatible among multiple images. Because IIkMC was a computationally intensive algorithm, parallelized approaches were deployed, using multi-cores of a central processing unit (CPU) and a graphics processing unit (GPU), to speed up the process. kMC and IIkMC were applied to a series of images acquired in a PlanetScope mission. In addition to the capability of the inter-image compatibility of the classification results, IIkMC could settle the problem of incomplete segmentation and class canceling revealed in kMC. Based on CPU parallelism, the speed of IIkMC improved, becoming up to 12.83 times better than sequential processing. When using a GPU, the speed improved up to 25.53 times, rising to 39.00 times with parallel reduction. From the results, it was confirmed IIkMC provided more reliable results than kMC, and its parallelism could facilitate the overall inspection of multiple images.

Parallel Computing and Scaling with Dask

Chapter

Jun 2023

Maurizio Petrelli

This chapter introduces basic concepts and definitions of parallel computing and model scaling. It starts by providing basic definitions and terminology and then introduces Daks, a Python library that provides object scalability to Python scientific libraries such as pandas, NumPy, and scikit-learn.

Synthesis of a large area ReS2 thin film by CVD for in-depth investigation of resistive switching: effects of metal electrodes, channel width and noise behaviour

Article

Aug 2023
Nanoscale

The anisotropic crystal structure and layer independent electrical and optical properties of ReS2 make it unique among other two-dimensional materials (2DMs), emphasizing a special need for its synthesis. This work discusses the synthesis and in-depth characterization of a 1 × 1 cm2 large and few layered ReS2 film. Vibrational modes and excitonic peaks observed from the Raman and photoluminescence (PL) spectra corroborated the formation of a ReS2 film with a 1.26 eV bandgap. High resolution transmission electron microscopy (HRTEM) images and selected area electron diffraction (SAED) patterns inferred the polycrystalline nature of the film, while cross-sectional field emission scanning electron microscopy (FESEM) indicated planar growth with ∼10 nm thickness. The chemical composition of the film analysed through X-ray photoelectron spectroscopy (XPS) indicated the formation of a ReS2 film with a Re : S atomic ratio of 1 : 1.75, indicating a small amount of non-stoichiometric RexSy. Following the basic characterization studies, the ReS2 film was tested for resistive switching (RS) device application in which the effects of different metal electrodes (Pt/Au and Ag/Au) and different channel widths (200, 100, and 50 μm) were studied. The highest memory window equal to 108 was obtained for the Ag/Au electrode while Pt/Au showed a memory window of 102. RS for the former was ascribed to the formation of a conducting filament (CF) because of the migration of Ag+ ions, while defect mediated charge carrier transport led to switching in the Pt/Au electrode. Furthermore, the RHRS/RLRS ratio achieved in this work (108) is also of the highest magnitude reported thus far. Furthermore, a comparison of devices with Ag/Au electrodes but with different channel widths (50, 100 and 200 μm) gave insightful results on the existence of multiple resistance states, device endurance and retention. An inverse relationship between the retention time and the device's channel width was observed, where the device with a 50 μm channel width showed a retention time of 48 hours, and the one with a 200 μm width showed stability only up to 3000 s. Furthermore, low frequency noise measurements were performed to understand the effect of defects in the low resistance state (LRS) and the high resistance state (HRS). The HRS exhibited Lorentzian noise behaviour while the LRS exhibited Lorentzian only at low current bias which converged to 1/f noise at higher current bias.

Process Discovery Techniques Recommendation Framework

Article

Full-text available

Jul 2023

In a competitive environment, organizations need to continuously understand, analyze and improve the behavior of processes to maintain their position in the market. Process mining is a set of techniques that allows organizations to have an X-ray view of their processes by extracting process related knowledge from the information recorded in today’s process aware information systems such as ‘Enterprise Resource Planning’ systems, ‘Business Process Management’ systems, ‘Supply Chain Management’ systems, etc. One of the major categories of process mining techniques is the process of discovery. This later allows for automatically constructing process models just from the information stored in the system representing the real behavior of the process discovered. Many process discovery algorithms have been proposed today which made users and businesses, in front of many techniques, unable to choose or decide the appropriate mining algorithm for their business processes. Moreover, existing evaluation and recommendation frameworks have several important drawbacks. This paper proposes a new framework for recommending the most suitable process discovery technique to a given process taking into consideration the limitations of existing frameworks.

Towards an Energy Complexity Model for Distributed Data Processing Algorithms

Article

Dec 2023

Modern data centers exist as infrastructure in the era of big data. Big data processing applications are the major computing workload of data centers. Electricity cost accounts for about 50% of data centers' operational costs. Therefore, the energy consumed for running distributed data processing algorithms on a data center is starting to attract both academia and industry. Most works study the energy consumption from the hardware perspective and only a few of them from the algorithm perspective. A general and hardware-independent energy evaluation model for the algorithms is in demand. With the model, algorithm designers can evaluate the energy consumption, compare energy consumption features and facilitate energy consumption optimization of distributed data processing algorithms. Inspired by the time complexity model, we propose an energy complexity model for describing the trends that an algorithm's energy consumption grows with the algorithm's input size. We argue that a good algorithm, especially for processing big data, should have a ‘small’ energy complexity. We define $E(n)$ to represent the functional relationship that associates an algorithm's input size $n$ with its notional energy consumption $E$ . Based on the well-known abstract Bulk Synchronous Parallel (BSP) computer and programming model, we present a complete $E(n)$ solution, including abstraction, generalization, quantification, derivation, comparison, analysis, examples, verification, and applications. Comprehensive experimental analysis shows that the proposed energy complexity model is practical, interestingly, and not equivalent to time complexity.

Evaluation of heat transfer in porous scaffolds under cryogenic treatment: a numerical study

Article

May 2023
MED BIOL ENG COMPUT

The present work had evaluated the effect of cryogenic treatment (233 K) on the degradation of polymeric biomaterial using a numerical model. The study on effect of cryogenic temperature on mechanical properties of cell-seeded biomaterials is very limited. However, no study had reported material degradation evaluation. Different structures of silk-fibroin-poly-electrolyte complex (SFPEC) scaffolds had been designed by varying hole distance and hole diameter, with reference to existing literature. The size of scaffolds were maintained at 5 [Formula: see text] 5 mm2. Current study evaluates the effect of cryogenic temperature on mechanical properties (corelated to degradation) of scaffold. Six parameters related to scaffold degradation: heat transfer, deformation gradient, stress, strain, strain tensor, and displacement gradient were analyzed for three different cooling rates (- 5 K/min, - 2 K/min, and - 1 K/min). Scaffold degradation had been evaluated in the presence of water and four different concentrations of cryoprotectant solution. Heat distribution at various points (points_base, point_wall and point_core) on the region of interest (ROI) was found similar for different cooling rates of the system. Thermal stress was found developing proportional to cooling rate, which leads to minimal variation in thermal stress over time. Strain tensor was found gradually decreasing due to attenuating response of deformation gradient. In addition to that, dipping down of cryogenic temperature had prohibited the movement of molecules in the crystalline structure which had restricting the displacement gradient. It was found that uniform distribution of desired heat at different cooling rates has the ability to minimize the responses of other scaffold degradation parameters. It was found that the rates of change in stress, strain, and strain tensor were minimal at different concentrations of cryoprotectant. The present study had predicted the degradation behavior of PEC scaffold under cryogenic temperature on the basis of explicit mechanical properties.

The Faster the Better? Optimal Warm-Up Strategies for a Micro Combined Heat and Power Plant

Article

Full-text available

May 2023

The warm-up process is a critical operation phase for mCHP plants, directly impacting their efficiency, reliability, and lifetime. As small decentralized power generation units are increasingly expected to be operated on demand, start-ups will occur more frequently and thus the importance of the warm-up process will further increase. In this study, we address this problem by presenting a mathematical optimization framework that finds optimal actuator trajectories that significantly reduce the warm-up time and improve the thermal efficiency of an mCHP plant. The proposed optimization framework is highly flexible and adaptable to various objective functions, such as maximizing efficiency or minimizing the deviation from desired temperature references. The underlying mathematical model has been experimentally validated on a physical mCHP test rig. Selected case studies further demonstrate the effectiveness and flexibility of the framework and show that with the optimized actuator trajectories, the mCHP plant can reach its steady-state operating temperature in 40% less time. The results also indicate that the shortest warm-up time does not necessarily lead to the highest thermal efficiency. Accordingly, the methodology proposed in this paper provides a powerful tool to study higher-level operational strategies of mCHP plants and thus to maximize their overall performance, which directly translates into an improved operational cost-effectiveness, particularly in demand-driven energy landscapes.

Thermal Intra-Layer Interaction of Discretized Fractal Exposure Strategies in Non-Isothermal Powder Bed Fusion of Polypropylene

Article

Full-text available

Mar 2023

Additive manufacturing of material systems sensitive to heat degradation represents an essential prerequisite for the integration of novel functionalized material systems in medical applications , such as the hybrid processing of high-performance thermoplastics and gelling polymers. For enabling an inherent process stability under non-isothermal conditions at reduced ambient temperatures in laser-based additive manufacturing, maintaining a homogeneous layer formation is of vital significance. To minimize crystallization-induced deflections of formed layers while avoiding support structures, the temporal and spatial discretization of the melting process is combined with the subsequent quenching of the polymer melt due to thermal conduction. Based on implementing superposed, phase-shifted fractal curves as the underlying exposure structure, the locally limited temporal and spatial discretization of the exposure process promotes a mesoscale compensation of crystallization shrinkage and thermal distortion, enabling the essential homogeneous layer formation. For improving the understanding of local parameter-dependent thermal intra-layer interactions under non-isothermal processing conditions, geometric boundary conditions of distinct exposure vectors and the underlying laser power are varied. Applying polypropylene as a model material, a significant influence of the spatial distance of fractal exposure structures on the thermal superposition of distinct exposure vectors can be derived, implicitly influencing temporal and temperature-dependent characteristics of the material crystallization and the emerging thermal material exposure. Furthermore, the formation of sub-focus structures can be observed, contributing to the spatial discretization of the layer formation, representing a decisive factor that influences the structure formation and mesoscopic part properties in non-isothermal powder bed fusion of polymers. Consequently, the presented approach represents a foundation for the support-free, accelerated non-isothermal additive manufacturing of both polymers and metals, demonstrating a novel methodology for the mesoscale compensation of thermal shrinkage.

An Aggregation-Based Algebraic Multigrid Method with Deflation Techniques and Modified Generic Factored Approximate Sparse Inverses

Article

Full-text available

Jan 2023

In this paper, we examine deflation-based algebraic multigrid methods for solving large systems of linear equations. Aggregation of the unknown terms is applied for coarsening, while deflation techniques are proposed for improving the rate of convergence. More specifically, the V-cycle strategy is adopted, in which, at each iteration, the solution is computed by initially decomposing it utilizing two complementary subspaces. The approximate solution is formed by combining the solution obtained using multigrids and deflation. In order to improve performance and convergence behavior, the proposed scheme was coupled with the Modified Generic Factored Approximate Sparse Inverse preconditioner. Furthermore, a parallel version of the multigrid scheme is proposed for multicore parallel systems, improving the performance of the techniques. Finally, characteristic model problems are solved to demonstrate the applicability of the proposed schemes, while numerical results are given.

Research and Analysis of Computing Cluster Configuration Management Systems

Chapter

Jan 2023

The article discusses the issues of research and analysis of computer cluster configuration management systems. The question of the evolution of cluster systems, the features of its architecture and possible configurations are touched upon. The main purpose of the article is to study and analyze the configuration management systems of a computing cluster. The main tasks considered in the article are: research, classification of parallel computing systems and determination of the place of cluster systems among them; study of the features of cluster architecture and its configurations. The theory of configuration management systems is considered, the four most currently used configuration management systems are investigated: Puppet, Chef, SaltStack and Ansible. Based on the configuration management systems comparison methodology, the study and comparison of these systems were carried out. Such indicators as properties related to the input specification, with the implementation of configurations and with the management of specifications are evaluated. Recommendations on the choice of a configuration management system for a computing cluster are proposed. The Ansible system was chosen as an example, and based on it, an overview of basic information about the operation of configuration management systems was conducted. The result of the study is an analysis of the configuration management systems of the computing cluster. Recommendations on the choice of a configuration management system for a computing cluster are proposed. In conclusion, the article presents the main results and conclusions obtained during the study.

A Pipeline Pattern Detection Technique in Polly

Conference Paper

Jan 2023

The Tiny-Tasks Granularity Trade-Off Balancing Overhead vs. Performance in Parallel Systems

Article

Full-text available

Apr 2023

Models of parallel processing systems typically assume that one has $l$ workers and jobs are split into an equal number of $k=l$ tasks. Splitting jobs into $k \gt l$ smaller tasks, i.e. using “tiny tasks”, can yield performance and stability improvements because it reduces the variance in the amount of work assigned to each worker, but as $k$ increases, the overhead involved in scheduling and managing the tasks begins to overtake the performance benefit. We perform extensive experiments on the effects of task granularity on an Apache Spark cluster, and based on these, develop a four-parameter model for task and job overhead that, in simulation, produces sojourn time distributions that match those of the real system. We also present analytical results which illustrate how using tiny tasks improves the stability region of split-merge systems, and analytical bounds on the sojourn and waiting time distributions of both split-merge and single-queue fork-join systems with tiny tasks. Finally we combine the overhead model with the analytical models to produce an analytical approximation to the sojourn and waiting time distributions of systems with tiny tasks which include overhead. We also perform analogous tiny-tasks experiments on a hybrid multi-processor shared memory system based on MPI and OpenMP which has no load-balancing between nodes. Though no longer strict analytical bounds, our analytical approximations with overhead match both the Spark and MPI/OpenMP experimental results very well.

Spectral Acoustic Fingerprints of Sand and Sandstone Sea Bottoms

Article

Full-text available

Dec 2022

Modern studies which dealt with the frequency domain analysis showed that a frequency-domain approach has an essential advantage and mentioned an inner qualitative relationship between the subsurface structure and its frequency spectra. This paper deals with the acoustic spectral response of sand and sandstone sediments at the sea bottom. An acoustic data collection campaign was conducted over two sand sites and two sandstone sites. The analysis of the results shows that reflections of acoustic signals from sand and sandstone sea bottom are characterized by various spectral features in the 2.75–6.75 kHz range. The differences in acoustic response of sand and sandstone can be quantified by examining the maximal normalized reflected power, the mean frequency, and the number of crossings at different power levels. The statistical value distribution of these potential classifiers was calculated and analyzed. These classifiers, and especially the roughness of the spectrum quantified by the number of crossings parameter can give information to assess the probability for sand or sandstone based on the reflected spectra and be used for actual distinction between sand and sandstone in sub bottom profiler data collection campaigns.

Computational complexity when constructing rational plans for program execution in a given field of parallel computers

Article

Dec 2022

Valeriy M. Bakanov

Objectives. The construction of rational plans (schedules) for parallel program execution (PPE) represents a challenging problem due to its ambiguity. The aim of this work is to create methods for developing such plans and specialized software for implementing these methods, which are based on the internal properties of algorithms, primarily on the property of internal (hidden) parallelism. Methods . The main method for developing PPE plans was the construction, analysis, and purposeful transformation of the stacked-parallel form (SPF) of information graphs of algorithms (IGA). The SPF was transformed by transferring operators from tier to tier of the SPF (this event was taken as an elementary step in determining the computational complexity of scenario execution). As a transformation tool, a method for developing transformation scenarios in the scripting programming language Lua was used. Scenarios were created by a heuristic approach using a set of Application Programming Interface (API) functions of the developed software system. These functions formed the basis for a comprehensive study of the parameters of the IGA and its SPF representation for the subsequent construction of a PPE plan applying to a given field of parallel computers. Results. Features of the internal properties of the algorithms that affect the efficiency of SPF transformations were identified during the course of computational experiments. Comparative indices of the computational complexity of obtaining PPE plans and other parameters (including code density, etc.) were obtained for various SPF transformation scenarios. An iterative approach to improving heuristic methods favors developing optimal schemes for solving the objective problem. Conclusions. The developed software system confirmed its efficiency for studying the parameters of hidden parallelism in arbitrary algorithms and rational use in data processing. The approach of using a scripting language to develop heuristic methods (scenarios) for the purposeful transformation of IGA forms showed great flexibility and transparency for the researcher. The target consumers of the developed methods for generating schedules for parallel execution of programs are, first of all, developers of translators and virtual machines, and researchers of the properties of algorithms (for identifying and exploiting the potential of their hidden parallelism). The developed software and methods have been successfully used for a number of years for increasing student competence in data processing parallelization at Russian universities.

Analysis of Dynamic Resource Allocation in Digital Education Ecosystems

Conference Paper

Mar 2022

High-Performance and Parallel Computing Techniques Review: Applications, Challenges and Potentials to Support Net-Zero Transition of Future Grids

Article

Full-text available

Nov 2022

The transition towards net-zero emissions is inevitable for humanity’s future. Of all the sectors, electrical energy systems emit the most emissions. This urgently requires the witnessed accelerating technological landscape to transition towards an emission-free smart grid. It involves massive integration of intermittent wind and solar-powered resources into future power grids. Additionally, new paradigms such as large-scale integration of distributed resources into the grid, proliferation of Internet of Things (IoT) technologies, and electrification of different sectors are envisioned as essential enablers for a net-zero future. However, these changes will lead to unprecedented size, complexity and data of the planning and operation problems of future grids. It is thus important to discuss and consider High Performance Computing (HPC), parallel computing, and cloud computing prospects in any future electrical energy studies. This article recounts the dawn of parallel computation in power system studies, providing a thorough history and paradigm background for the reader, leading to the most impactful recent contributions . The reviews are split into Central Processing Unit (CPU) based, Graphical Processing Unit (GPU) based, and Cloud-based studies and smart grid applications. The state-of-the-art is also discussed, highlighting the issue of standardization and the future of the field. The reviewed papers are predominantly focused on classical imperishable electrical system problems. This indicates the need for further research on parallel and HPC approaches applied to future smarter grid challenges, particularly to the integration of renewable energy into the smart grid.

Jack, The Autotuner

Article

Jul 2022

Richard Vuduc

Among his many technical contributions, automated performance tuning—or autotuning—has been a critical pillar of the research that has emerged from Jack Dongarra's group at the University of Tennessee. This article reflects on the history of that work, including an analysis of the large body of research that has used or cited the ATLAS system since its inception in the late 1990 s.

SDCBench: A Benchmark Suite for Workload Colocation and Evaluation in Datacenters

Article

Full-text available

Sep 2022

Colocating workloads are commonly used in datacenters to improve server utilization. However, the unpredictable application performance degradation caused by the contention for shared resources makes the problem difficult and limits the efficiency of this approach. This problem has sparked research in hardware and software techniques that focus on enhancing the datacenters’ isolation abilities. There is still lack of a comprehensive benchmark suite to evaluate such techniques. To address this problem, we present SDCBench, a new benchmark suite that is specifically designed for workload colocation and characterization in datacenters. SDCBench includes 16 applications that span a wide range of cloud scenarios, which are carefully selected from the existing benchmarks using the clustering analysis method. SDCBench implements a robust statistical methodology to support workload colocation and proposes a concept of latency entropy for measuring the isolation ability of cloud systems. It enables cloud tenants to understand the performance isolation ability in datacenters and choose their best-fitted cloud services. For cloud providers, it also helps them to improve the quality of service to increase their revenues. Experimental results show that SDCBench can simulate different workload colocation scenarios by generating pressures on multidimensional resources with simple configurations. We also use SDCBench to compare the latency entropies in public cloud platforms such as Huawei Cloud and AWS Cloud and a local prototype system FlameCluster-II; the evaluation results show FlameCluster-II has the best performance isolation ability over these three cloud systems, with 0.99 of experience availability and 0.29 of latency entropy.

Research on multithread programming method based on Java programming

Conference Paper

Mar 2022

Analysis of Concurrent Processes in Internet of Things Solutions

Chapter

Jan 2022

The rapid development of ICT has enabled services that did not exist before. Such services include shared vehicles like e-scooters, which offer a quick and simple mobility. A method of analysis for e-scooter and client collaboration is offered to detect e-scooter and client collaboration risks when e-scooter is used concurrently by many clients. The method provides for the creation of an exact description of e-scooter and customer cooperation processes, from which e-scooter and all-executable scenarios are created through symbolic execution. When analyzing the results of scenario execution, e-scooter and client collaboration risks are identified. The obtained result allows e-scooter system developers to prevent or mitigate identified risks, while customers have the possibility to use new transport services effectively but knowing the potential risks.KeywordConcurrent processesRisk analysisInternet of things

Hybrid Parallel-in-Time-and-Space Transient Stability Simulation of Large-Scale AC/DC Grids

Article

Full-text available

Nov 2022
IEEE T POWER SYST

The increasing complexity of modern AC/DC power systems poses a significant challenge to a fast solution of largescale transient stability simulation problems. This paper proposes the hybrid parallel-in-time-and-space (PiT+PiS) transient simulation on the CPU-GPU platform to thoroughly exploit the parallelism from time and spatial perspectives, thereby fully utilizing parallel processing hardware. The respective electromechanical and electromagnetic aspects of the AC and DC grids demand a combination of transient stability (TS) simulation and electromagnetic transient (EMT) simulation to reflect both system-level and equipment-level transients. The TS simulation is performed on GPUs in the co-simulation, while the Parareal parallel-in-time (PiT) scheduling and EMT simulation are conducted on CPUs. Therefore, the heterogeneous CPU-GPU scheme can utilize asynchronous computing features to offset the data transfer latency between different processors. Higher scalability and extensibility than GPU-only dynamic parallelism design is achieved by utilizing concurrent GPU streams for coarse-grid and fine-grid computation. A synthetic AC/DC grid based on IEEE- 118 Bus and CIGRÉ DCS2 systems showed a good accuracy compared to commercial TSAT software, and a speedup of 165 is achieved with 48 IEEE-118 Bus systems and 192 201-Level detail-modeled MMCs. Furthermore, the proposed method is also applicable to multi-GPU implementation where it demonstrates decent efficacy.

Study of access template to graphics engine GM effect on the performance

Article

Jun 2016

The work objective is to study the effect of the graphical processor unit computational cores load level and memory access pattern on the memory bus bandwidth and scaling acceleration. The research subject is the problem of scalability of the parallel computing performance and acceleration. The following hypothesis is checked: while processing images for multi-core shared-memory systems, Gustafson - Barsis’s law is more crucial than the memory access template at the underloading of the GPU cores. The research methodology is a computational experiment with further analysis of the obtained results. The conclusions are as follows. The suggested hypothesis is proved. For that, a series of experiments on various heterogeneous computational systems with OpenCL standard support is conducted. The application field of the results obtained includes the development of algorithms and software for the highly parallel computer systems. The memory access template starts to place certain restrictions on the algorithm efficiency only when the load level of the computational cores is sufficient. Video cards with the private memory show more stable results in comparison to those which share memory with the central processing unit.

Algebraic Modelling of a Generic Fog Scenario for Moving IoT Devices

Chapter

Jul 2021

Moving IoT devices may change their positions at any time, although they may always need to have their remote computing resources as close as possible due to their intrinsic restrictions. This condition makes Fog computing an ideal solution, where hosts may be distributed over the Fog domain, and additionally, the Cloud domain may be used as support. In this context, a generic scenario is presented and the most common actions regarding the management of virtual machines associated with such moving IoT devices are being modelled and verified by algebraic means, focusing on the message exchange among all concerned actors for each action.Roig, Pedro JuanAlcaraz, SalvadorGilly, KatjaJuiz, Carlos

Providing In-depth Performance Analysis for Heterogeneous Task-based Applications with StarVZ

Conference Paper

Jun 2021

Modelling a Plain N-Hypercube Topology for Migration in Fog Computing

Chapter

Jan 2021

Fog Computing deployments need consolidated Data Center infrastructures in order to get optimal performances in those special environments. One of the key points in attaining such achievements may be the implementation of Data Center topologies with enhanced features for a relatively small number of users, although ready for dealing with occasional traffic peaks. In this paper, a plain N-Hypercube switching infrastructure is modelled in different ways, such as using arithmetic, logical and algebraic ways, focusing on its capabilities to manage VM migrations among hosts within such a topology.

Multiparametric Multiextremal Optimization Algorithm Software Simplex Evolution for MATLAB and Simintech Libraries

Conference Paper

May 2021

Coordinated process scheduling algorithms for coupled earth system models

Article

Full-text available

Jun 2021

It is becoming increasingly significant for humans to predict and understand future climate changes using coupled climate system models. Although the performance and scalability of individual physical components have improved over the past few years, coupled climate systems still suffer from low efficiency. This paper focuses on the process scheduling problem for the widely applied coupled earth system model (CESM). The proposed resource allocation strategies allow components to execute on a compromised suboptimal setup and still maintain approximately the best parallel speedup. With this flexible resource allocation strategy, we further propose a coordinated process scheduling algorithm (CPSA). More notably, we propose an upgraded version called CPSA-B, which makes efficient resource sharing configurations, including resource allocation and process layout of components. We integrate CPSA and CPSA-B as pre-arrangement tools into the CESM program and deploy them on the Huawei Kunpeng platform. The speedup curves of the CESM components are prepared in advance, based on sampling tests. Experimental data show that CPSA-B reduces up to 58% of the execution time compared with the CESM default strategy. The algorithm has low complexity and can efficiently find solutions for large input sizes.

FreezeNet: Full Performance by Reduced Storage Costs

Chapter

Feb 2021

Pruning generates sparse networks by setting parameters to zero. In this work we improve one-shot pruning methods, applied before training, without adding any additional storage costs while preserving the sparse gradient computations. The main difference to pruning is that we do not sparsify the network’s weights but learn just a few key parameters and keep the other ones fixed at their random initialized value. This mechanism is called freezing the parameters. Those frozen weights can be stored efficiently with a single 32bit random seed number. The parameters to be frozen are determined one-shot by a single for- and backward pass applied before training starts. We call the introduced method FreezeNet. In our experiments we show that FreezeNets achieve good results, especially for extreme freezing rates. Freezing weights preserves the gradient flow throughout the network and consequently, FreezeNets train better and have an increased capacity compared to their pruned counterparts. On the classification tasks MNIST and CIFAR-10/100 we outperform SNIP, in this setting the best reported one-shot pruning method, applied before training. On MNIST, FreezeNet achieves $99.2\%$ performance of the baseline LeNet-5-Caffe architecture, while compressing the number of trained and stored parameters by a factor of $\times 157$.

Fat Tree Algebraic Formal Modelling Applied to Fog Computing

Chapter

Oct 2020

Fog computing brings distributed computing resources closer to end users, thus allowing for better performance in internet of things applications. In this context, if all necessary resources worked together in an autonomous manner, there might be no need for an orchestrator to manage the whole process, as long as there is no Cloud or Edge infrastructure involved. This way, control messages would not flood the entire network and more efficiency would be achieved. In this paper, a framework composed by a string of sequential wireless relays is presented, each one being attached to a fog computing node and all of those being interconnected by a fat tree architecture. To start with, all items involved in that structure are classified into different layers, and in turn, they are modelled by using Algebra of Communicating Processes. At this stage, a couple of scenarios are being proposed: first, an ideal one where the physical path always takes the same direction and storage space is not an issue, and then, a more realistic one where the physical path may take both directions and there may be storage constraints.

Parallel Processing, 1980 to 2020

Article

Oct 2020

This historical survey of parallel processing from 1980 to 2020 is a follow-up to the authors' 1981 Tutorial on Parallel Processing, which covered the state of the art in hardware, programming languages, and applications. Here, we cover the evolution of the field since 1980 in: parallel computers, ranging from the Cyber 205 to clusters now approaching an exaflop, to multicore microprocessors, and Graphic Processing Units (GPUs) in commodity personal devices; parallel programming notations such as OpenMP, MPI message passing, and CUDA streaming notation; and seven parallel applications, such as finite element analysis and computer vision. Some things that looked like they would be major trends in 1981, such as big Single Instruction Multiple Data arrays disappeared for some time but have been revived recently in deep neural network processors. There are now major trends that did not exist in 1980, such as GPUs, distributed memory machines, and parallel processing in nearly every commodity device. This book is intended for those that already have some knowledge of parallel processing today and want to learn about the history of the three areas. In parallel hardware, every major parallel architecture type from 1980 has scaled-up in performance and scaled-out into commodity microprocessors and GPUs, so that every personal and embedded device is a parallel processor. There has been a confluence of parallel architecture types into hybrid parallel systems. Much of the impetus for change has been Moore's Law, but as clock speed increases have stopped and feature size decreases have slowed down, there has been increased demand on parallel processing to continue performance gains. In programming notations and compilers, we observe that the roots of today's programming notations existed before 1980. And that, through a great deal of research, the most widely used programming notations today, although the result of much broadening of these oots, remain close to target system architectures allowing the programmer to almost explicitly use the target's parallelism to the best of their ability. The parallel versions of applications directly or indirectly impact nearly everyone, computer expert or not, and parallelism has brought about major breakthroughs in numerous application areas. Seven parallel applications are studied in this book.

Iteration-based adjoint method for the sensitivity analysis of static aeroelastic loads

Article

Full-text available

Dec 2020
STRUCT MULTIDISCIP O

The sensitivity analysis of static aeroelastic loads can be analytically performed by virtue of a modified stiffness matrix. However, introducing a modified stiffness matrix into the calculation will incur extra computational cost. Additionally, the intrinsic drawback of the direct method is disadvantageous for the sensitivity calculation with a large number of design variables. This paper therefore presents a novel iteration-based adjoint method for the sensitivity analysis of static aeroelastic loads acting on flexible wing, the basis of which is the static aeroelastic calculation via loosely coupled iteration between the potential-flow panel model and the structural linear finite element model. By using an iterative approach for evaluating the adjoint variable, modification to the original stiffness matrix can be obviated. Moreover, this method is more competent for the structural sensitivity analysis with a very large number of design variables, since the adjoint variable is unrelated to the designs. A rectangular wing and a swept wing are employed to demonstrate the verification of the algorithms. The design sensitivities of applied nodal forces on the structure, lift per unit span, total lift and root bending moment are calculated and analyzed. The computational cost is also discussed to further demonstrate the efficiency of the proposed method.

ResearchGate has not been able to resolve any references for this publication.

Encyclopedia of Parallel Computing

Abstract

Chapters (100)

Recommended publications

Parallelizing a multi-frame blind deconvolution algorithm on clusters of multicore processors

Tianhe-1A Interconnect and Message-Passing Services

Hybrid MPI/Pthreads Parallelization of the RAxML Phylogenetics Code

Streamline Integration Using MPI-Hybrid Parallelism on a Large Multicore Architecture