Recommended parameter settings for two current GPU architectures: Nvidia's Tesla and Fermi.

Source publication

PEPPHER: Efficient and productive usage of hybrid computing systems

Article

Full-text available

Nov 2011

The 3-year European FP7 project PEPPHER addresses efficient utilization and usage of hybrid (heterogeneous) computer systems consisting of multi-core CPUs with GPU-type accelerators. PEPPHER is concerned with two major aspects: programmability and efficiency on given heterogeneous systems, and code and performance portability between different hete...

Context 1

... highly nontrivial, auto-tunable GPU sorting algorithm developed by N. Leischner et al. is an example of the level of adaptable, portable performance that algo- rithm engineering experts can achieve. 8 Table 1 summarizes some of the Nvidia Tesla and Fermi architectures' basic, performance-determining parameters. Based on these, the (sorting) algorithm developer infers tunable, algorithmic parameters related to these architectural features (see Table 2). ...

View in full-text

Design and Development of Grid Enabled, G2PU Accelerated Java Application (Protein Sequence Study) for Grid Performance Analysis

Article

Full-text available

Dec 2015

Recent advancement in the field of structural biology has generated huge volume of data and analyzing such data is vital to know the hidden truths of life but such analysis is compute intensive in nature and requires huge computational power resulting in extensive use of high performance computing (Multi Core Computing, G2PU Computing, CPU-GPU Hybr...

Figure 3. Efficiency curves for problem in Section 5.2.

Figure 4. Plot of discrete solution components z 1j and z 2j of problem...

Figure 5. Exact and discrete solutions of problem in Section 5.4 using...

Figure 6. Plot of exact (blue line) and discrete (red dots) solutions...

Development of a Higher-Order 𝒜-Stable Block Approach with Symmetric Hybrid Points and an Adaptive Step-Size Strategy for Integrating Differential Systems Efficiently

Article

Full-text available

Aug 2023

This article introduces a computational hybrid one-step technique designed for solving initial value differential systems of a first order, which utilizes second derivative function evaluations. The method incorporates three intra-step symmetric points that are calculated to provide an optimum version of the suggested scheme. By combining the hybri...

CUDA thread and memory hierarchy in GPUs

Performance of automatic search algorithms for...

A CPU-GPU-based parallel search algorithm for the best differential characteristics of block ciphers

Article

Full-text available

Oct 2021

The differential characteristics with high probability are critical for differential cryptanalysis. The process of searching such differential characteristics, especially the best one, is time-consuming. We believe that the modern hybrid computing systems can be used to accelerate the search process. However, to the best of our knowledge, the exist...

Fields of number density (cm⁻³) during laser-induced plume expansion...

Distribution of the number density of Cu vapor (solid curves) and...

Distributions of the density of the number flux of Cu vapor deposited...

Number of atoms of the ablated material deposited on the cavity lateral...

Effect of the background gas pressure on the effectiveness of laser-induced material removal from deep cavities in irradiated targets

Article

Full-text available

Dec 2017

Two-dimensional expansion of a plume, induced by short-pulse laser irradiation of a bottom of a cylindrical cavity in a copper target, into argon background gas at pressure ranging from 0 to 1 bar is studied numerically based on a hybrid computational model that includes a heat conduction equation for the irradiated target and a kinetic model of th...

Table 1 . Table of elapsed time, BA tt  .

Figure 6. Optimum paths for a range of weights r.

Figure 7. Measurement error normalized to maximum error for

Figure 7 depicts normalized "measurement error" for

Figure 8. Traveled time normalized to maximum time for

Planning the Minimum Time and Optimal Survey Trajectory for Autonomous Underwater Vehicles in Uncertain Current

Article

Full-text available

Dec 2015

The authors develop an approach to a “best” time path for Autonomous Underwater Vehicles conducting oceanographic measurements under uncertain current flows. The numerical optimization tool DIDO is used to compute hybrid minimum time and optimal survey paths for a sample of currents between ebb and flow. A simulated meta-experiment is performed whe...

Umpalumpa: a framework for efficient execution of complex image processing workloads on heterogeneous nodes

Article

Full-text available

Jun 2023
COMPUTING

Modern computers are typically heterogeneous devices—besides the standard central processing unit (CPU), they commonly include an accelerator such as a graphics processing unit (GPU). However, exploiting the full potential of such computers is challenging, especially when complex workloads consisting of multiple computationally demanding tasks are to be processed. This paper proposes a framework called Umpalumpa, which aims to manage complex workloads on heterogeneous computers. Umpalumpa combines three aspects that ease programming and optimize code performance. Firstly, it implements a data-centric design, where data are described by their physical properties (e. g., location in memory, size) and logical properties (e. g., dimensionality, shape, padding). Secondly, Umpalumpa utilizes task-based parallelism to schedule tasks on heterogeneous nodes. Thirdly, tasks can be dynamically autotuned on a source code level according to the hardware where the task is executed and the processed data. Altogether, Umpalumpa allows for implementing a complex workload, which is automatically executed on CPUs and accelerators, and allows autotuning to maximize the performance with the given hardware and data input. Umpalumpa focuses on image processing workloads, but the concept is generic and can be extended to different types of workloads. We demonstrate the usability of the proposed framework on two previously accelerated applications from cryogenic electron microscopy: 3D Fourier reconstruction and Movie alignment. We show that, compared to the original implementations, Umpalumpa reduces the complexity and improves the maintainability of the main applications’ loops while improving performance through automatic memory management and autotuning of the GPU kernels.

Data-driven Real-time Short-term Prediction of Air Quality: Comparison of ES, ARIMA, and LSTM

Preprint

Full-text available

Nov 2022

Air pollution is a worldwide issue that affects the lives of many people in urban areas. It is considered that the air pollution may lead to heart and lung diseases. A careful and timely forecast of the air quality could help to reduce the exposure risk for affected people. In this paper, we use a data-driven approach to predict air quality based on historical data. We compare three popular methods for time series prediction: Exponential Smoothing (ES), Auto-Regressive Integrated Moving Average (ARIMA) and Long short-term memory (LSTM). Considering prediction accuracy and time complexity, our experiments reveal that for short-term air pollution prediction ES performs better than ARIMA and LSTM.

The OCR-Vx experience: lessons learned from designing and implementing a task-based runtime system

Article

Full-text available

Jul 2022
J SUPERCOMPUT

Task-based runtime systems are an important branch of parallel programming research, since tasks decouple computation from the compute units, giving the runtime systems greater flexibility than a thread-based solution. This makes it easier to deal with the ever-increasing complexity of parallel architectures by providing a separation of concerns—the specification of parallelism is separated from the implementation of the parallel computations on a specific architecture. The Open Community Runtime is one such system, aimed at large-scale parallel systems. Unlike many other task-based runtime systems, the creators not only provided an implementation but there is also a comprehensive specification document. This has allowed us to create an independent implementation, called OCR-Vx. In this article, we present our experience of developing the runtime system, put our work in the context of the specification and the other implementations, and describe key lessons that we have learned during our work. We discuss the design and implementation issues of task-based runtime systems and applications including task synchronization and scheduling, data management, memory consistency, the relation between shared-memory and distributed-memory runtime systems, NUMA architectures, and heterogeneous systems. The article is aimed at audiences not familiar with OCR, since we believe these lessons could be valuable for developers working on other task-based runtime systems or designing new ones.

Optimization of heterogeneous systems with AI planning heuristics and machine learning: a performance and energy aware approach

Article

Full-text available

Dec 2021
COMPUTING

Heterogeneous computing systems provide high performance and energy efficiency. However, to optimally utilize such systems, solutions that distribute the work across host CPUs and accelerating devices are needed. In this paper, we present a performance and energy aware approach that combines AI planning heuristics for parameter space exploration with a machine learning model for performance and energy evaluation to determine a near-optimal system configuration. For data-parallel applications our approach determines a near-optimal host-device distribution of work, number of processing units required and the corresponding scheduling strategy. We evaluate our approach for various heterogeneous systems accelerated with GPU or the Intel Xeon Phi. The experimental results demonstrate that our approach finds a near-optimal system configuration by evaluating only about 7% of reasonable configurations. Furthermore, the performance per Joule estimation of system configurations using our machine learning model is more than 1000 $$\times $$ × faster compared to the system evaluation by program execution.

Optimization of Heterogeneous Systems with AI Planning Heuristics and Machine Learning: A Performance and Energy Aware Approach

Preprint

Full-text available

Jun 2021

Heterogeneous computing systems provide high performance and energy efficiency. However, to optimally utilize such systems, solutions that distribute the work across host CPUs and accelerating devices are needed. In this paper, we present a performance and energy aware approach that combines AI planning heuristics for parameter space exploration with a machine learning model for performance and energy evaluation to determine a near-optimal system configuration. For data-parallel applications our approach determines a near-optimal host-device distribution of work, number of processing units required and the corresponding scheduling strategy. We evaluate our approach for various heterogeneous systems accelerated with GPU or the Intel Xeon Phi. The experimental results demonstrate that our approach finds a near-optimal system configuration by evaluating only about 7% of reasonable configurations. Furthermore, the performance per Joule estimation of system configurations using our machine learning model is more than 1000x faster compared to the system evaluation by program execution.

Algorithmic Skeletons and Parallel Design Patterns in Mainstream Parallel Programming

Article

Full-text available

Apr 2021
INT J PARALLEL PROG

This paper discusses the impact of structured parallel programming methodologies in state-of-the-art industrial and research parallel programming frameworks. We first recap the main ideas underpinning structured parallel programming models and then present the concepts of algorithmic skeletons and parallel design patterns. We then discuss how such concepts have permeated the wider parallel programming community. Finally, we give our personal overview—as researchers active for more than two decades in the parallel programming models and frameworks area—of the process that led to the adoption of these concepts in state-of-the-art industrial and research parallel programming frameworks, and the perspectives they open in relation to the exploitation of forthcoming massively-parallel (both general and special-purpose) architectures.

Run-Time Exploitation of Application Dynamism for Energy-Efficient Exascale Computing

Chapter

Jan 2020

As in the embedded systems domain, energy efficiency has recently become one of the main design criteria in high performance computing. The European Union Horizon 2020 project READEX (Run-time Exploitation of Application Dynamism for Energy-efficient eXascale computing) has developed a tools-aided auto-tuning methodology inspired by system scenario based design. Applying similar concepts as those presented in earlier chapters of this book, the dynamic behavior of HPC applications is exploited to achieve improved energy efficiency and performance. Driven by a consortium of European experts from academia, HPC resource providers, and industry, the READEX project has developed the first generic framework of its kind for split design-time and run-time tuning while targeting heterogeneous systems at the Exascale level. Using a real-life boundary element application, energy savings of more than 30% can be shown.

Programming Languages for Data-Intensive HPC Applications: a Systematic Mapping Study

Article

Nov 2019
PARALLEL COMPUT

Please cite this article as: V. Amaral, B. Norberto and M. Goulão et al., Programming languages for data-Intensive HPC applications: A systematic mapping study, Parallel Computing, https://doi.org/10.1016/j.parco.2019.102584 A major challenge in modelling and simulation is the need to combine expertise in both software technologies and a given scientific domain. When High-Performance Computing (HPC) is required to solve a scientific problem, software development becomes a problematic issue. Considering the complexity of the software for HPC, it is useful to identify programming languages that can be used to alleviate this issue. Because the existing literature on the topic of HPC is very dispersed, we performed a Systematic Mapping Study (SMS) in the context of the European COST Action cHiPSet. This literature study maps characteristics of various programming languages for data-intensive HPC applications, including category, typical user profiles, effectiveness, and type of articles. We organised the SMS in two phases. In the first phase, relevant articles are identified employing an automated keyword-based search in eight digital libraries. This lead to an initial sample of 420 papers, which was then narrowed down in a second phase by human inspection of article abstracts, titles and keywords to 152 relevant articles published in the period 2006–2018. The analysis of these articles enabled us to identify 26 programming languages referred to in 33 of relevant articles. We compared the outcome of the mapping study with results of our questionnaire-based survey that involved 57 HPC experts. The mapping study and the survey revealed that the desired features of programming languages for data-intensive HPC applications are portability, performance and usability. Furthermore, we observed that the majority of the programming languages used in the context of data-intensive HPC applications are text-based general-purpose programming languages. Typically these have a steep learning curve, which makes them difficult to adopt. We believe that the outcome of this study will inspire future research and development in programming languages for data-intensive HPC applications. Keywords High performance computing (HPC)Big dataData-intensive applicationsProgramming languagesDomain-Specific language (DSL)General-Purpose language (GPL)Systematic mapping study (SMS)

Customizing Pareto Simulated Annealing for Multi-objective Optimization of Control Cabinet Layout

Preprint

Jun 2019

Determining the optimal location of control cabinet components requires the exploration of a large configuration space. For real-world control cabinets it is impractical to evaluate all possible cabinet configurations. Therefore, we need to apply methods for intelligent exploration of cabinet configuration space that enable to find a near-optimal configuration without evaluation of all possible configurations. In this paper, we describe an approach for multi-objective optimization of control cabinet layout that is based on Pareto Simulated Annealing. Optimization aims at minimizing the total wire length used for interconnection of components and the heat convection within the cabinet. We simulate heat convection to study the warm air flow within the control cabinet and determine the optimal position of components that generate heat during the operation. We evaluate and demonstrate the effectiveness of our approach empirically for various control cabinet sizes and usage scenarios.

Medical Data Processing and Analysis for Remote Health and Activities Monitoring

Chapter

Full-text available

Mar 2019

Recent developments in sensor technology, wearable computing , Internet of Things (IoT), and wireless communication have given rise to research in ubiquitous healthcare and remote monitoring of human's health and activities. Health monitoring systems involve processing and analysis of data retrieved from smartphones, smart watches, smart bracelets, as well as various sensors and wearable devices. Such systems enable continuous monitoring of patients psychological and health conditions by sensing and transmitting measurements such as heart rate, electrocardiogram, body temperature, respiratory rate, chest sounds, or blood pressure. Pervasive healthcare, as a relevant application domain in this context, aims at revolutionizing the delivery of medical services through a medical assistive environment and facilitates the independent living of patients. In this chapter, we discuss (1) data collection, fusion, ownership and privacy issues; (2) models, technologies and solutions for

Recommended parameter settings for two current GPU architectures: Nvidia's Tesla and Fermi.

Context in source publication

Similar publications

Citations