Parameter settings for the GPU models used

Source publication

Solving optimization problems using a hybrid systolic search on GPU plus CPU

Article

Full-text available

Jun 2017

In recent years, graphics processing units (GPUs) have emerged as a powerful architecture for solving a broad spectrum of applications in very short periods of time. However, most existing GPU optimization approaches do not exploit the full power available in a CPU–GPU platform. They have a tendency to leave one of them partially unused (usually th...

Context 1

... the case of the GPU, we had two GPU models in order to prove the performance behavior for HySyS. Details of each model are presented in Table 2. The HySyS approach tested in each GPU model is called HySyS n650 and HySyS n780, respectively. ...

View in full-text

Cluster optimization algorithm based on CPU and GPU hybrid architecture

Article

Full-text available

Aug 2022
CLUSTER COMPUT

With the rapid development of network technology and parallel computing, clusters formed by connecting a large number of PCs with high-speed networks have gradually replaced the status of supercomputers in scientific research and production and high-performance computing with cost-effective advantages. The research purpose of this paper is to integrate the Kriging proxy model method and energy efficiency modeling method into a cluster optimization algorithm of CPU and GPU hybrid architecture. This paper proposes a parallel computing model for large-scale CPU/GPU heterogeneous high-performance computing systems, which can effectively describe the computing capabilities and various communication behaviors of CPU/GPU heterogeneous systems, and finally provide algorithm optimization for CPU/GPU heterogeneous clusters. According to the GPU architecture, an efficient method of constructing a Kriging proxy model and an optimized search algorithm are designed. The experimental results in this paper show that the construction of the Kriging proxy model can obtain a 220 times speedup ratio, and the search algorithm can reach an 8 times speedup ratio. It can be seen that this heterogeneous cluster optimization algorithm has high feasibility.

Resource Allocation for URLLC Service in In-Band Full-Duplex Based V2I Networks

Article

Full-text available

May 2022
IEEE T COMMUN

This paper investigates the first resource management problem for vehicular-to-infrastructure (V2I) networks under the in-band full-duplex (IBFD) backhauling scheme with guaranteed ultra-reliable and low-latency (URLLC) service. The considered networks suffers from interference caused by three node transmission in IBFD scheme and the mobility of vehicular user equipments (VUEs). The resource allocation problem is formulated to jointly optimize VUE association, resource block assignment (RA) and power allocation (PA), while satisfying the reliability and latency constraints at the same time. The formulated problem is a mixed integer non-linear problem with non-convex objective function and constraints. Finding globally optimal solution for this type of problem is still an open problem. To develop tractable solutions, the original problem is firstly simplified using derived equivalent expression for the objective function. The proposed method then decomposed it into RA and PA sub-problems. In each iteration, the PA sub-problem is solved for a set of given PA solution; while the solution for PA sub-problem is searched under determined RA results. The RA and PA sub-problems are solved iteratively until the converging condition is achieved. Theoretical analysis proves that the proposed method achieves Nash-stable equilibrium and local optimality for RA and PA sub-problems respectively. Simulation results verify the effectiveness of the derived performance analysis and demonstrate that the proposed algorithm outperforms the state-of-the-art algorithms in the literatures.

Time-energy analysis of multilevel parallelism in heterogeneous clusters: the case of EEG classification in BCI tasks

Article

Full-text available

Jul 2019
J SUPERCOMPUT

Present heterogeneous architectures interconnect nodes including multiple multi-core microprocessors and accelerators that allow different strategies to accelerate the applications and optimize their energy consumption according to the specific power-performance trade-offs. In this paper, a multilevel parallel procedure is proposed to take advantage of all nodes of a heterogeneous CPU–GPU cluster. Two more alternatives have been implemented, and experimentally compared and analyzed from both running time and energy consumption. Although the paper considers an evolutionary master–worker algorithm for feature selection in EEG classification, the conclusions from the experimental analysis here provided can be frequently applied, as many other useful bioinformatics and data mining applications show the same master–worker profile than the classification problem here considered. Our parallel approach allows to reduce the time by a factor of up to 83, with only about a 4.9% of energy consumed by the sequential procedure, in a cluster with 36 CPU cores and 43 GPU compute units.

Speedup and Energy Analysis of EEG Classification for BCI Tasks on CPU-GPU Clusters

Conference Paper

Sep 2018

Many data mining applications on bioinformatics and bioengineering require solving problems with different profiles from the point of view of their implicit parallelism. In this context, heterogeneous architectures comprised by interconnected nodes with multiple multi-core microprocessors and accelerators, such as vector processors, Graphics Processing Units (GPUs), or Field-Programmable Gate Arrays would constitute suitable platforms that offer the possibility of not only to accelerate the running time of the applications, but also to optimize the energy consumption. In this paper, we analyze the speedups and energy consumption of a parallel multiobjective approach for feature selection and classification of electroencephalograms in Brain Computing Interface tasks, by considering different implementation alternatives in a heterogeneous CPU-GPU cluster. The procedure is able to take advantage of parallelism through message-passing among the CPU-GPU nodes of the cluster (through shared-memory and thread-level parallelism in the CPU cores, and data-level parallelism and thread-level parallelism in the GPU). The experimental results show high code accelerations and high energy-savings: running times between 1.4 and 5.3% of the sequential time and energy consumptions between 5.9 and 11.6% of the energy consumed by the sequential execution.

Multi-objective feature selection for EEG classification with multi-level parallelism on heterogeneous CPU-GPU clusters

Conference Paper

Full-text available

Jul 2018

The present trend in the development of computer architectures that offer improvements in both performance and energy efficiency has provided clusters with interconnected nodes including multiple multi-core microprocessors and accelerators. In these so-called heterogeneous computers, the applications can take advantage of different parallelism levels according to the characteristics of the architectures in the platform. Thus, the applications should be properly programmed to reach good efficiencies, not only with respect to the achieved speedups but also taking into account the issues related to energy consumption. In this paper we provide a multi-objective evolutionary algorithm for feature selection in electroencephalogram (EEG) classification, which can take advantage of parallelism at multiple levels: among the CPU-GPU nodes interconnected in the cluster (through message-passing), and inside these nodes (through shared-memory thread-level parallelism in the CPU cores, and data-level parallelism and thread-level parallelism in the GPU). The procedure has been experimentally evaluated in performance and energy consumption and shows statistically significant benefits for feature selection: speedups of up to 73 requiring only a 6% of the energy consumed by the sequential code.

Parallel synchronous and asynchronous coupled simulated annealing

Article

Full-text available

Jun 2018
J SUPERCOMPUT

We propose a parallel synchronous and asynchronous implementation of the coupled simulated annealing (CSA) algorithm in a shared-memory architecture. The original CSA was implemented synchronously in a distributed-memory architecture. It synchronizes at each temperature update, which leads to idling and loss of efficiency when increasing the number of processors. The proposed synchronous CSA (SCSA) is implemented as the original, but in a shared-memory architecture. The proposed asynchronous CSA (ACSA) does not synchronize, allowing a larger parallel efficiency for larger numbers of processors. Results from extensive experiments show that the proposed ACSA presents much better quality of solution when compared to the serial and to the SCSA. The experiments also show that the performance of the proposed ACSA is better than the SCSA for less computationally intensive problems or when a larger number of processing cores are available. Moreover, the parallel efficiency of the ACSA improves by increasing the size of the problem. With the advent of the Multi-core Era, the use of the proposed algorithm becomes more attractive than the original synchronous CSA.

GPU parallelization strategies for metaheuristics: a survey

Article

Jan 2018

Metaheuristics have been showing interesting results in solving hard optimization problems. However, they become limited in terms of effectiveness and runtime for high dimensional problems. Thanks to the independency of metaheuristics components, parallel computing appears as an attractive choice to reduce the execution time and to improve solution quality. By exploiting the increasing performance and programability of graphics processing units (GPUs) to this aim, GPU-based parallel metaheuristics have been implemented using different designs. Recent results in this area show that GPUs tend to be effective co-processors for leveraging complex optimization problems. In this survey, mechanisms involved in GPU programming for implementing parallel metaheuristics are presented and discussed through a study of relevant research papers. Metaheuristics can obtain satisfying results when solving optimization problems in a reasonable time. However, they suffer from the lack of scalability. Metaheuristics become limited ahead complex high-dimensional optimization problems. To overcome this limitation, GPU based parallel computing appears as a strong alternative. Thanks to GPUs, parallel metaheuristics achieved better results in terms of computation, and even solution quality.

Systematic Literature Review on Parallel Trajectory-based Metaheuristics

Article

Jul 2022
ACM COMPUT SURV

In the last 35 years, parallel computing has drawn increasing interest from the academic community, especially in solving complex optimization problems that require large amounts of computational power. The use of parallel (multi-core and distributed) architectures is a natural and effective alternative to speeding up search methods, such as metaheuristics, and to enhancing the quality of the solutions. This survey focuses particularly on studies that adopt high-performance computing techniques to design, implement, and experiment trajectory-based metaheuristics, which pose a great challenge to high-performance computing and represent a large gap in the operations research literature. We outline the contributions from 1987 to the present, and the result is a complete overview of the current state of the art with respect to multi-core and distributed trajectory-based metaheuristics. Basic notions of high-performance computing are introduced, and different taxonomies for multi-core and distributed architectures and metaheuristics are reviewed. A comprehensive list of 127 publications is summarized and classified according to taxonomies and application types. Furthermore, past and future trends are indicated, and open research gaps are identified.

Design of Collaboration Engine for Large-Scale Heterogeneous Clusters

Chapter

Jul 2020

Aiming at the low utilization rate of intensive computing cores in large heterogeneous clusters and the high complexity of collaborative computing between GPU and multi-core CPUs, this paper aims to improve resource utilization and reduce programming complexity in heterogeneous clusters. A new heterogeneous cluster cooperative computing model and engine design scheme are proposed. The complexity of communication between nodes and cooperative mechanism within nodes is analyzed. Coarse-grained cooperative execution plan is represented by template technology, and fine-grained cooperative computing plan is created by finite automata. The experimental results validate the effectiveness of the collaborative engine. Comparing with the manual programming scheme, it is found that the computational performance loss is less than 4.2%. The collaborative computing engine can effectively improve the resource utilization of large-scale heterogeneous clusters and the programming efficiency of ordinary developers.

Parallel computational optimization in operations research: A new integrative framework, literature review and research directions

Article

Nov 2020
EUR J OPER RES

Guido Schryen

Solving optimization problems with parallel algorithms has a long tradition in OR. Its future relevance for solving hard optimization problems in many fields, including finance, logistics, production and design, is leveraged through the increasing availability of powerful computing capabilities. Acknowledging the existence of several literature reviews on parallel optimization, we did not find reviews that cover the most recent literature on the parallelization of both exact and (meta)heuristic methods. However, in the past decade substantial advancements in parallel computing capabilities have been achieved and used by OR scholars so that an overview of modern parallel optimization in OR that accounts for these advancements is beneficial. Another issue from previous reviews results from their adoption of different foci so that concepts used to describe and structure prior literature differ. This heterogeneity is accompanied by a lack of unifying frameworks for parallel optimization across methodologies, application fields and problems, and it has finally led to an overall fragmented picture of what has been achieved and still needs to be done in parallel optimization in OR. This review addresses the aforementioned issues with three contributions: First, we suggest a new integrative framework of parallel computational optimization across optimization problems, algorithms and application $ Invited review domains. The framework integrates the perspectives of algorithmic design and computational implementation of parallel optimization. Second, we apply the framework to synthesize prior research on parallel optimization in OR, focusing on computational studies published in the period 2008-2017. Finally, we suggest research directions for parallel optimization in OR.

Parameter settings for the GPU models used

Context in source publication

Citations