Table 1 - uploaded by David Moloney
Content may be subject to copyright.
Recommended parameter settings for two current GPU architectures: Nvidia's Tesla and Fermi. 

Recommended parameter settings for two current GPU architectures: Nvidia's Tesla and Fermi. 

Source publication
Article
Full-text available
The 3-year European FP7 project PEPPHER addresses efficient utilization and usage of hybrid (heterogeneous) computer systems consisting of multi-core CPUs with GPU-type accelerators. PEPPHER is concerned with two major aspects: programmability and efficiency on given heterogeneous systems, and code and performance portability between different hete...

Context in source publication

Context 1
... highly nontrivial, auto-tunable GPU sorting algorithm developed by N. Leischner et al. is an example of the level of adaptable, portable performance that algo- rithm engineering experts can achieve. 8 Table 1 summarizes some of the Nvidia Tesla and Fermi architectures' basic, performance-determining parameters. Based on these, the (sorting) algorithm developer infers tunable, algorithmic parameters related to these architectural features (see Table 2). ...

Similar publications

Article
Full-text available
Recent advancement in the field of structural biology has generated huge volume of data and analyzing such data is vital to know the hidden truths of life but such analysis is compute intensive in nature and requires huge computational power resulting in extensive use of high performance computing (Multi Core Computing, G2PU Computing, CPU-GPU Hybr...
Article
Full-text available
This article introduces a computational hybrid one-step technique designed for solving initial value differential systems of a first order, which utilizes second derivative function evaluations. The method incorporates three intra-step symmetric points that are calculated to provide an optimum version of the suggested scheme. By combining the hybri...
Article
Full-text available
The differential characteristics with high probability are critical for differential cryptanalysis. The process of searching such differential characteristics, especially the best one, is time-consuming. We believe that the modern hybrid computing systems can be used to accelerate the search process. However, to the best of our knowledge, the exist...
Article
Full-text available
Two-dimensional expansion of a plume, induced by short-pulse laser irradiation of a bottom of a cylindrical cavity in a copper target, into argon background gas at pressure ranging from 0 to 1 bar is studied numerically based on a hybrid computational model that includes a heat conduction equation for the irradiated target and a kinetic model of th...
Article
Full-text available
The authors develop an approach to a “best” time path for Autonomous Underwater Vehicles conducting oceanographic measurements under uncertain current flows. The numerical optimization tool DIDO is used to compute hybrid minimum time and optimal survey paths for a sample of currents between ebb and flow. A simulated meta-experiment is performed whe...

Citations

... The idea of streamlining the deployment of complex pipelines on heterogeneous hardware while maintaining performance portability has previously been explored by the PEPPHER component model [30,31]. This model addresses some of the problems outlined in this paper by using a task-based approach capable of generating multiple implementations of a task and switching between them at runtime in a performance-aware manner. ...
Article
Full-text available
Modern computers are typically heterogeneous devices—besides the standard central processing unit (CPU), they commonly include an accelerator such as a graphics processing unit (GPU). However, exploiting the full potential of such computers is challenging, especially when complex workloads consisting of multiple computationally demanding tasks are to be processed. This paper proposes a framework called Umpalumpa, which aims to manage complex workloads on heterogeneous computers. Umpalumpa combines three aspects that ease programming and optimize code performance. Firstly, it implements a data-centric design, where data are described by their physical properties (e. g., location in memory, size) and logical properties (e. g., dimensionality, shape, padding). Secondly, Umpalumpa utilizes task-based parallelism to schedule tasks on heterogeneous nodes. Thirdly, tasks can be dynamically autotuned on a source code level according to the hardware where the task is executed and the processed data. Altogether, Umpalumpa allows for implementing a complex workload, which is automatically executed on CPUs and accelerators, and allows autotuning to maximize the performance with the given hardware and data input. Umpalumpa focuses on image processing workloads, but the concept is generic and can be extended to different types of workloads. We demonstrate the usability of the proposed framework on two previously accelerated applications from cryogenic electron microscopy: 3D Fourier reconstruction and Movie alignment. We show that, compared to the original implementations, Umpalumpa reduces the complexity and improves the maintainability of the main applications’ loops while improving performance through automatic memory management and autotuning of the GPU kernels.
... The future work may investigate the performance of additional prediction methods in the context of air pollution. Techniques for parallel processing ( [4,17]), acceleration ( [24,13,6]), and intelligent parameter selection ( [14]) could be studied to further improve the efficiency. ...
Preprint
Full-text available
Air pollution is a worldwide issue that affects the lives of many people in urban areas. It is considered that the air pollution may lead to heart and lung diseases. A careful and timely forecast of the air quality could help to reduce the exposure risk for affected people. In this paper, we use a data-driven approach to predict air quality based on historical data. We compare three popular methods for time series prediction: Exponential Smoothing (ES), Auto-Regressive Integrated Moving Average (ARIMA) and Long short-term memory (LSTM). Considering prediction accuracy and time complexity, our experiments reveal that for short-term air pollution prediction ES performs better than ARIMA and LSTM.
... However, the OCR would not provide a significant advantage over existing systems and we have decided not to try and replicate their work in OCR, with the best-case scenario being us matching their performance. We have considered porting the PEPPHER high-level framework [7] for OCR, to have a side-by-side comparison with StarPU, but early results on a CPU-only version have shown that we are unlikely to significantly outperform StarPU. Instead, we focused on other, more promising areas, especially NUMA, abandoning further development of OpenCL in OCR. ...
... Another application that has been implemented with OCR-Vx is a real-world face detection application we original developed in the PEPPHER project [7] with a high-level pattern-based programming framework [5] on top of the StarPU runtime system. The application utilizes routines from the open-source computer vision library OpenCV [28], which have been slightly reengineered to conform to the tasking model. ...
Article
Full-text available
Task-based runtime systems are an important branch of parallel programming research, since tasks decouple computation from the compute units, giving the runtime systems greater flexibility than a thread-based solution. This makes it easier to deal with the ever-increasing complexity of parallel architectures by providing a separation of concerns—the specification of parallelism is separated from the implementation of the parallel computations on a specific architecture. The Open Community Runtime is one such system, aimed at large-scale parallel systems. Unlike many other task-based runtime systems, the creators not only provided an implementation but there is also a comprehensive specification document. This has allowed us to create an independent implementation, called OCR-Vx. In this article, we present our experience of developing the runtime system, put our work in the context of the specification and the other implementations, and describe key lessons that we have learned during our work. We discuss the design and implementation issues of task-based runtime systems and applications including task synchronization and scheduling, data management, memory consistency, the relation between shared-memory and distributed-memory runtime systems, NUMA architectures, and heterogeneous systems. The article is aimed at audiences not familiar with OCR, since we believe these lessons could be valuable for developers working on other task-based runtime systems or designing new ones.
... Benkner et al. [27] developed PEPPHER that is a programming framework for heterogeneous systems that comprise CPUs and accelerators (such as, GPU or Intel Xeon Phi). PEPPHER involves source-to-source compilation and a run-time system capable of mapping code components on an extensible set of target processor architectures. ...
Article
Full-text available
Heterogeneous computing systems provide high performance and energy efficiency. However, to optimally utilize such systems, solutions that distribute the work across host CPUs and accelerating devices are needed. In this paper, we present a performance and energy aware approach that combines AI planning heuristics for parameter space exploration with a machine learning model for performance and energy evaluation to determine a near-optimal system configuration. For data-parallel applications our approach determines a near-optimal host-device distribution of work, number of processing units required and the corresponding scheduling strategy. We evaluate our approach for various heterogeneous systems accelerated with GPU or the Intel Xeon Phi. The experimental results demonstrate that our approach finds a near-optimal system configuration by evaluating only about 7% of reasonable configurations. Furthermore, the performance per Joule estimation of system configurations using our machine learning model is more than 1000 $$\times $$ × faster compared to the system evaluation by program execution.
... Benkner et al. [4] developed PEPPHER that is a programming framework for heterogeneous systems that comprise CPUs and accelerators (such as, GPU or Intel Xeon Phi). PEPPHER involves source-to-source compilation and a runtime system capable of mapping code components on an extensible set of target processor architectures. ...
... Huang [17] no yes no yes yes no Haidar [15] no yes no no yes yes Kasichayanula [19] no yes no yes yes no Pereira [30] no yes yes no no yes Hong [16] no yes no no yes yes Cerotti [6] yes yes no no no yes Benkner [4] yes yes no yes no yes Ge [12] yes yes no no yes yes Ravi [35] yes yes no no no yes Grewe [13] yes yes no yes no yes This paper yes yes yes yes yes yes ...
Preprint
Full-text available
Heterogeneous computing systems provide high performance and energy efficiency. However, to optimally utilize such systems, solutions that distribute the work across host CPUs and accelerating devices are needed. In this paper, we present a performance and energy aware approach that combines AI planning heuristics for parameter space exploration with a machine learning model for performance and energy evaluation to determine a near-optimal system configuration. For data-parallel applications our approach determines a near-optimal host-device distribution of work, number of processing units required and the corresponding scheduling strategy. We evaluate our approach for various heterogeneous systems accelerated with GPU or the Intel Xeon Phi. The experimental results demonstrate that our approach finds a near-optimal system configuration by evaluating only about 7% of reasonable configurations. Furthermore, the performance per Joule estimation of system configurations using our machine learning model is more than 1000x faster compared to the system evaluation by program execution.
... Different target architectures are fully addressed, including GP-GPUs but also FPGAs, which makes the structured programming model effective in dealing with the more advanced architectures available. Finally, structured parallel programming models have been adopted in several EU-funded research projects (SkePU in Peppher [4] and Excess 6 , FastFlow in ParaPhrase [37], REPARA 7 and RePhrase 8 projects) that spread adoption across different industrial contexts and therefore ensured a wider diffusion of their concepts. This is the ''integration'' phase, where concepts born, grown and matured in the structured parallel programming research enclave begin to permeate other communities, based on the consolidated results achieved so far. ...
... Using the rule ðmapðf ÞÞðXÞ ðseqð8 x 2 X do x ¼ f ðxÞÞÞðXÞ.4 Using the classical map fusion rule mapðf Þ mapðgÞ mapðf gÞ, the composition of two maps computes the same result as a single map of a function which is the composition of the two map functions. ...
Article
Full-text available
This paper discusses the impact of structured parallel programming methodologies in state-of-the-art industrial and research parallel programming frameworks. We first recap the main ideas underpinning structured parallel programming models and then present the concepts of algorithmic skeletons and parallel design patterns. We then discuss how such concepts have permeated the wider parallel programming community. Finally, we give our personal overview—as researchers active for more than two decades in the parallel programming models and frameworks area—of the process that led to the adoption of these concepts in state-of-the-art industrial and research parallel programming frameworks, and the perspectives they open in relation to the exploitation of forthcoming massively-parallel (both general and special-purpose) architectures.
... The Performance Portability and Programmability for Heterogeneous Many-core Architectures PEPPHER project [1] has developed a methodology and framework for programming and optimizing applications for single-node heterogeneous manycore processors to ensure performance portability. With Intel as a key partner in the project, READEX goes one step further and provides a framework that supports the heterogeneity of the system in the form of tuning parameters, which enable large-scale heterogeneous applications to dynamically (and automatically) adapt heterogeneous resources according to run-time requirements. ...
Chapter
As in the embedded systems domain, energy efficiency has recently become one of the main design criteria in high performance computing. The European Union Horizon 2020 project READEX (Run-time Exploitation of Application Dynamism for Energy-efficient eXascale computing) has developed a tools-aided auto-tuning methodology inspired by system scenario based design. Applying similar concepts as those presented in earlier chapters of this book, the dynamic behavior of HPC applications is exploited to achieve improved energy efficiency and performance. Driven by a consortium of European experts from academia, HPC resource providers, and industry, the READEX project has developed the first generic framework of its kind for split design-time and run-time tuning while targeting heterogeneous systems at the Exascale level. Using a real-life boundary element application, energy savings of more than 30% can be shown.
... Parallelization of sequential legacy code as well as writing parallel programs from scratch is not easy and the difficulty of programming multi-core systems is also known as programmability wall [57]. The multi-core shift in computer architecture has accelerated the research efforts in developing new programming frameworks for parallel computing, which should assist domain scientists, for instance, by generating and optimising low-level parallel code for coordination of computations across multiple cores and multiple computers [8,58]. ...
... RQ2. 8 What are the technologies used to create the language tool suite? RQ2. 9 Does the language target specific hardware? ...
Article
Please cite this article as: V. Amaral, B. Norberto and M. Goulão et al., Programming languages for data-Intensive HPC applications: A systematic mapping study, Parallel Computing, https://doi.org/10.1016/j.parco.2019.102584 A major challenge in modelling and simulation is the need to combine expertise in both software technologies and a given scientific domain. When High-Performance Computing (HPC) is required to solve a scientific problem, software development becomes a problematic issue. Considering the complexity of the software for HPC, it is useful to identify programming languages that can be used to alleviate this issue. Because the existing literature on the topic of HPC is very dispersed, we performed a Systematic Mapping Study (SMS) in the context of the European COST Action cHiPSet. This literature study maps characteristics of various programming languages for data-intensive HPC applications, including category, typical user profiles, effectiveness, and type of articles. We organised the SMS in two phases. In the first phase, relevant articles are identified employing an automated keyword-based search in eight digital libraries. This lead to an initial sample of 420 papers, which was then narrowed down in a second phase by human inspection of article abstracts, titles and keywords to 152 relevant articles published in the period 2006–2018. The analysis of these articles enabled us to identify 26 programming languages referred to in 33 of relevant articles. We compared the outcome of the mapping study with results of our questionnaire-based survey that involved 57 HPC experts. The mapping study and the survey revealed that the desired features of programming languages for data-intensive HPC applications are portability, performance and usability. Furthermore, we observed that the majority of the programming languages used in the context of data-intensive HPC applications are text-based general-purpose programming languages. Typically these have a steep learning curve, which makes them difficult to adopt. We believe that the outcome of this study will inspire future research and development in programming languages for data-intensive HPC applications. Keywords High performance computing (HPC)Big dataData-intensive applicationsProgramming languagesDomain-Specific language (DSL)General-Purpose language (GPL)Systematic mapping study (SMS)
... Execution of these experiments requires on average 158.7 seconds to finish. One way to improve the execution time is by utilizing high-performance parallel computing systems [19], [20], [21], however this is out of scope of this paper. ...
Preprint
Determining the optimal location of control cabinet components requires the exploration of a large configuration space. For real-world control cabinets it is impractical to evaluate all possible cabinet configurations. Therefore, we need to apply methods for intelligent exploration of cabinet configuration space that enable to find a near-optimal configuration without evaluation of all possible configurations. In this paper, we describe an approach for multi-objective optimization of control cabinet layout that is based on Pareto Simulated Annealing. Optimization aims at minimizing the total wire length used for interconnection of components and the heat convection within the cabinet. We simulate heat convection to study the warm air flow within the control cabinet and determine the optimal position of components that generate heat during the operation. We evaluate and demonstrate the effectiveness of our approach empirically for various control cabinet sizes and usage scenarios.
... Many approaches have proposed for using HPC systems [18,39,56,60]. While multi-core CPUs are suitable for general-purpose tasks, many-core processors (such as the Intel Xeon Phi [20] or GPU [58]) comprise a larger number of lower frequency cores and perform well on scalable applications [54] (such as, DNA sequence analysis [55] or deep learning [79]). ...
Chapter
Full-text available
Recent developments in sensor technology, wearable computing , Internet of Things (IoT), and wireless communication have given rise to research in ubiquitous healthcare and remote monitoring of human's health and activities. Health monitoring systems involve processing and analysis of data retrieved from smartphones, smart watches, smart bracelets, as well as various sensors and wearable devices. Such systems enable continuous monitoring of patients psychological and health conditions by sensing and transmitting measurements such as heart rate, electrocardiogram, body temperature, respiratory rate, chest sounds, or blood pressure. Pervasive healthcare, as a relevant application domain in this context, aims at revolutionizing the delivery of medical services through a medical assistive environment and facilitates the independent living of patients. In this chapter, we discuss (1) data collection, fusion, ownership and privacy issues; (2) models, technologies and solutions for