Table 2 - uploaded by Enrique Alba
Content may be subject to copyright.
Parameter settings for the GPU models used

Parameter settings for the GPU models used

Source publication
Article
Full-text available
In recent years, graphics processing units (GPUs) have emerged as a powerful architecture for solving a broad spectrum of applications in very short periods of time. However, most existing GPU optimization approaches do not exploit the full power available in a CPU–GPU platform. They have a tendency to leave one of them partially unused (usually th...

Context in source publication

Context 1
... the case of the GPU, we had two GPU models in order to prove the performance behavior for HySyS. Details of each model are presented in Table 2. The HySyS approach tested in each GPU model is called HySyS n650 and HySyS n780, respectively. ...

Citations

... Vidal P has developed a hybrid approach to effectively solve optimization problems. Using a CPU-GPU hybrid architecture can benefit from running it in parallel on the CPU and GPU [1]. In order to evaluate the robustness of CAD or quantitative imaging methods, Young S tests them in various situations and under various image acquisition and reconstruction conditions that represent the heterogeneity encountered in clinical practice. ...
Article
Full-text available
With the rapid development of network technology and parallel computing, clusters formed by connecting a large number of PCs with high-speed networks have gradually replaced the status of supercomputers in scientific research and production and high-performance computing with cost-effective advantages. The research purpose of this paper is to integrate the Kriging proxy model method and energy efficiency modeling method into a cluster optimization algorithm of CPU and GPU hybrid architecture. This paper proposes a parallel computing model for large-scale CPU/GPU heterogeneous high-performance computing systems, which can effectively describe the computing capabilities and various communication behaviors of CPU/GPU heterogeneous systems, and finally provide algorithm optimization for CPU/GPU heterogeneous clusters. According to the GPU architecture, an efficient method of constructing a Kriging proxy model and an optimized search algorithm are designed. The experimental results in this paper show that the construction of the Kriging proxy model can obtain a 220 times speedup ratio, and the search algorithm can reach an 8 times speedup ratio. It can be seen that this heterogeneous cluster optimization algorithm has high feasibility.
... In this case, the proposed algorithm is still highly applicable because the effectiveness property is beneficial for achieving maximal throughput within limited computational time. Moreover, recent literature in the field of hardware architecture has shown that the computational time for solving optimization problems can be significantly reduced [39]. Thus, the influence of computational delay can be mitigated by exerting the most up-to-date hardware technique. ...
Article
Full-text available
This paper investigates the first resource management problem for vehicular-to-infrastructure (V2I) networks under the in-band full-duplex (IBFD) backhauling scheme with guaranteed ultra-reliable and low-latency (URLLC) service. The considered networks suffers from interference caused by three node transmission in IBFD scheme and the mobility of vehicular user equipments (VUEs). The resource allocation problem is formulated to jointly optimize VUE association, resource block assignment (RA) and power allocation (PA), while satisfying the reliability and latency constraints at the same time. The formulated problem is a mixed integer non-linear problem with non-convex objective function and constraints. Finding globally optimal solution for this type of problem is still an open problem. To develop tractable solutions, the original problem is firstly simplified using derived equivalent expression for the objective function. The proposed method then decomposed it into RA and PA sub-problems. In each iteration, the PA sub-problem is solved for a set of given PA solution; while the solution for PA sub-problem is searched under determined RA results. The RA and PA sub-problems are solved iteratively until the converging condition is achieved. Theoretical analysis proves that the proposed method achieves Nash-stable equilibrium and local optimality for RA and PA sub-problems respectively. Simulation results verify the effectiveness of the derived performance analysis and demonstrate that the proposed algorithm outperforms the state-of-the-art algorithms in the literatures.
... Many contributions on parallel implementations of evolutionary algorithms in CPU-GPU platforms have been proposed in the literature, although most of them do not completely exploit the CPU-GPU computing capacity as only use one CPU thread to control the GPU activity [11]. Among the approaches for GPU-based implementations of evolutionary algorithms analyzed in [12], some of them propose to implement all or the most of the steps of the evolutionary algorithm in GPU to decrease the cost of transferring information between CPU and GPU. ...
... There are not many approaches using the CPU and GPU as resources that can be equally considered to distribute the workload of the optimization procedure like in this paper. Paper [11] proposes a methodology to solve optimization problems in heterogeneous CPU-GPU architectures that benefit from both CPU and GPU devices and points out the usefulness of further researching on this approach. Our approach includes an evolutionary multi-objective optimization and a clustering algorithm applied to a set of high-dimensional patterns. ...
... Normally, some devices will finish their work before others, causing idle time for the most powerful devices, and thus, not only an increase in energy consumption but also a reduction in the acceleration of the algorithm. Thus, in the v3 version, each thread, W t j , now has the capacity to return its subpopulation, Sp l i , directly to the master, and ask for a new subpopulation (lines [11][12]. The cost of introducing this improvement is the elimination of Fig. 2 Fitness evaluation in the devices, providing the third and fourth parallelism levels the so-called local migrations implemented in the v1 and v2 versions to migrate individuals between the subpopulations of each device. ...
Article
Full-text available
Present heterogeneous architectures interconnect nodes including multiple multi-core microprocessors and accelerators that allow different strategies to accelerate the applications and optimize their energy consumption according to the specific power-performance trade-offs. In this paper, a multilevel parallel procedure is proposed to take advantage of all nodes of a heterogeneous CPU–GPU cluster. Two more alternatives have been implemented, and experimentally compared and analyzed from both running time and energy consumption. Although the paper considers an evolutionary master–worker algorithm for feature selection in EEG classification, the conclusions from the experimental analysis here provided can be frequently applied, as many other useful bioinformatics and data mining applications show the same master–worker profile than the classification problem here considered. Our parallel approach allows to reduce the time by a factor of up to 83, with only about a 4.9% of energy consumed by the sequential procedure, in a cluster with 36 CPU cores and 43 GPU compute units.
... This way, the development of energy-performance efficient codes for heterogeneous CPU-GPU systems needs to address hardware and software issues related with the cooperation among the CPU-GPU nodes [32], along with the challenges of the CPU-GPU heterogeneous computing. Among those, we have the size of the CPU and GPU memories, the CPU-GPU memory bandwidth limitations, the load balancing among the CPU and GPU cores, the overlapping of data transfer with the CPU and GPU computation, and the parallelism profile of the application considered. ...
... Moreover, in the last years, some significant works have appeared dealing with the energy consumption issues of CPU and GPU architectures. Nevertheless, most of them do not completely exploit the CPU-GPU computing and energysaving capabilities as they usually tend to take advantage of the thread and data parallelism available in the GPU and only use one or few CPU threads to control the GPU activity [32]. ...
... On the other hand, there are not many approaches using the CPU and GPU cores as resources that can be equally considered to distribute the workload of the procedure. Paper [32] proposes a methodology to solve optimization problems in heterogeneous CPU-GPU architectures that benefits from both CPU and GPU cores, and points out the usefulness of further researching on this approach. Although the use of heterogeneous architectures including data parallel architectures such as GPUs has been proposed in previous papers, the parallelization on a heterogeneous platform of a whole data mining application with the characteristics of our target application is less frequent in the literature. ...
Conference Paper
Many data mining applications on bioinformatics and bioengineering require solving problems with different profiles from the point of view of their implicit parallelism. In this context, heterogeneous architectures comprised by interconnected nodes with multiple multi-core microprocessors and accelerators, such as vector processors, Graphics Processing Units (GPUs), or Field-Programmable Gate Arrays would constitute suitable platforms that offer the possibility of not only to accelerate the running time of the applications, but also to optimize the energy consumption. In this paper, we analyze the speedups and energy consumption of a parallel multiobjective approach for feature selection and classification of electroencephalograms in Brain Computing Interface tasks, by considering different implementation alternatives in a heterogeneous CPU-GPU cluster. The procedure is able to take advantage of parallelism through message-passing among the CPU-GPU nodes of the cluster (through shared-memory and thread-level parallelism in the CPU cores, and data-level parallelism and thread-level parallelism in the GPU). The experimental results show high code accelerations and high energy-savings: running times between 1.4 and 5.3% of the sequential time and energy consumptions between 5.9 and 11.6% of the energy consumed by the sequential execution.
... Moreover, besides offering opportunities to execute efficient parallel codes, the heterogeneous architectures which include CPU and GPU cores could also constitute an efficient approach for energy-saving, and papers such as [14] consider the efficient cooperation of CPU and GPU as an important concern to reach exascale performances. This way, the development of energy-performance efficient codes for heterogeneous CPU-GPU systems needs to address hardware and software issues related with the cooperation among CPU-GPU nodes [22], along with challenges involved in CPU-GPU heterogeneous computing. Among those, we have the size of CPU and GPU memories, the CPU-GPU memory bandwidth limitations, the load balancing among the CPU and GPU cores, the overlapping of data transfer with CPU and GPU computation, and the parallelism profile of the application considered. ...
... A relatively high number of contributions on parallel implementations of evolutionary algorithms considering CPU-GPU platforms can be found in the literature. Nevertheless, most of them do not completely exploit the CPU-GPU computing power as they usually tend to take advantage of thread and data parallelism available in GPU and only use one CPU thread to control the GPU activity [22]. ...
... There are not many approaches using the CPU and GPU cores as resources that can be equally considered to distribute the workload of the optimization procedure. Paper [22] proposes a methodology to solve optimization problems in heterogeneous CPU-GPU architectures that benefit from both CPU and GPU cores, and points out the usefulness of further researching on this approach. Our procedure includes an evolutionary multi-objective optimization and a clustering algorithm applied to a set of high-dimensional patterns. ...
Conference Paper
Full-text available
The present trend in the development of computer architectures that offer improvements in both performance and energy efficiency has provided clusters with interconnected nodes including multiple multi-core microprocessors and accelerators. In these so-called heterogeneous computers, the applications can take advantage of different parallelism levels according to the characteristics of the architectures in the platform. Thus, the applications should be properly programmed to reach good efficiencies, not only with respect to the achieved speedups but also taking into account the issues related to energy consumption. In this paper we provide a multi-objective evolutionary algorithm for feature selection in electroencephalogram (EEG) classification, which can take advantage of parallelism at multiple levels: among the CPU-GPU nodes interconnected in the cluster (through message-passing), and inside these nodes (through shared-memory thread-level parallelism in the CPU cores, and data-level parallelism and thread-level parallelism in the GPU). The procedure has been experimentally evaluated in performance and energy consumption and shows statistically significant benefits for feature selection: speedups of up to 73 requiring only a 6% of the energy consumed by the sequential code.
... Recently, graphics processing units (GPUs) has emerged as an alternative to speed up computational time in dozens of times, including metaheuristics [29]. The development of GPU-based heuristics and CPU-GPU hybrid heuristic strategies can be found in [2,3,6,9,10,13,17,29,[34][35][36]40], respectively. ...
Article
Full-text available
We propose a parallel synchronous and asynchronous implementation of the coupled simulated annealing (CSA) algorithm in a shared-memory architecture. The original CSA was implemented synchronously in a distributed-memory architecture. It synchronizes at each temperature update, which leads to idling and loss of efficiency when increasing the number of processors. The proposed synchronous CSA (SCSA) is implemented as the original, but in a shared-memory architecture. The proposed asynchronous CSA (ACSA) does not synchronize, allowing a larger parallel efficiency for larger numbers of processors. Results from extensive experiments show that the proposed ACSA presents much better quality of solution when compared to the serial and to the SCSA. The experiments also show that the performance of the proposed ACSA is better than the SCSA for less computationally intensive problems or when a larger number of processing cores are available. Moreover, the parallel efficiency of the ACSA improves by increasing the size of the problem. With the advent of the Multi-core Era, the use of the proposed algorithm becomes more attractive than the original synchronous CSA.
... The authors noticed that parallelizing the improvement method has the greatest impact on performance compared to other components of SS. The results can be seen in Table 2. Another algorithm, called the systolic neighborhood search (SNS) [60], has been used in [79] to solve three combinatorial optimization problems: massively multimodal deceptive problem (MMDP), subset sum problem (SSP), and maximum cut problem (MAXCUT). SNS is defined as a set of cells (GPU threads in our case) that create a mesh network between each other. ...
... Then, it sends it to the neighbor cell. A CPU-GPU cooperative hybrid implementation of an hybrid systolic search has been proposed in [79]. In this design, a CPU thread 1 is dedicated to transfer data to GPU. ...
Article
Metaheuristics have been showing interesting results in solving hard optimization problems. However, they become limited in terms of effectiveness and runtime for high dimensional problems. Thanks to the independency of metaheuristics components, parallel computing appears as an attractive choice to reduce the execution time and to improve solution quality. By exploiting the increasing performance and programability of graphics processing units (GPUs) to this aim, GPU-based parallel metaheuristics have been implemented using different designs. Recent results in this area show that GPUs tend to be effective co-processors for leveraging complex optimization problems. In this survey, mechanisms involved in GPU programming for implementing parallel metaheuristics are presented and discussed through a study of relevant research papers. Metaheuristics can obtain satisfying results when solving optimization problems in a reasonable time. However, they suffer from the lack of scalability. Metaheuristics become limited ahead complex high-dimensional optimization problems. To overcome this limitation, GPU based parallel computing appears as a strong alternative. Thanks to GPUs, parallel metaheuristics achieved better results in terms of computation, and even solution quality.
Article
In the last 35 years, parallel computing has drawn increasing interest from the academic community, especially in solving complex optimization problems that require large amounts of computational power. The use of parallel (multi-core and distributed) architectures is a natural and effective alternative to speeding up search methods, such as metaheuristics, and to enhancing the quality of the solutions. This survey focuses particularly on studies that adopt high-performance computing techniques to design, implement, and experiment trajectory-based metaheuristics, which pose a great challenge to high-performance computing and represent a large gap in the operations research literature. We outline the contributions from 1987 to the present, and the result is a complete overview of the current state of the art with respect to multi-core and distributed trajectory-based metaheuristics. Basic notions of high-performance computing are introduced, and different taxonomies for multi-core and distributed architectures and metaheuristics are reviewed. A comprehensive list of 127 publications is summarized and classified according to taxonomies and application types. Furthermore, past and future trends are indicated, and open research gaps are identified.
Chapter
Aiming at the low utilization rate of intensive computing cores in large heterogeneous clusters and the high complexity of collaborative computing between GPU and multi-core CPUs, this paper aims to improve resource utilization and reduce programming complexity in heterogeneous clusters. A new heterogeneous cluster cooperative computing model and engine design scheme are proposed. The complexity of communication between nodes and cooperative mechanism within nodes is analyzed. Coarse-grained cooperative execution plan is represented by template technology, and fine-grained cooperative computing plan is created by finite automata. The experimental results validate the effectiveness of the collaborative engine. Comparing with the manual programming scheme, it is found that the computational performance loss is less than 4.2%. The collaborative computing engine can effectively improve the resource utilization of large-scale heterogeneous clusters and the programming efficiency of ordinary developers.
Article
Solving optimization problems with parallel algorithms has a long tradition in OR. Its future relevance for solving hard optimization problems in many fields, including finance, logistics, production and design, is leveraged through the increasing availability of powerful computing capabilities. Acknowledging the existence of several literature reviews on parallel optimization, we did not find reviews that cover the most recent literature on the parallelization of both exact and (meta)heuristic methods. However, in the past decade substantial advancements in parallel computing capabilities have been achieved and used by OR scholars so that an overview of modern parallel optimization in OR that accounts for these advancements is beneficial. Another issue from previous reviews results from their adoption of different foci so that concepts used to describe and structure prior literature differ. This heterogeneity is accompanied by a lack of unifying frameworks for parallel optimization across methodologies, application fields and problems, and it has finally led to an overall fragmented picture of what has been achieved and still needs to be done in parallel optimization in OR. This review addresses the aforementioned issues with three contributions: First, we suggest a new integrative framework of parallel computational optimization across optimization problems, algorithms and application $ Invited review domains. The framework integrates the perspectives of algorithmic design and computational implementation of parallel optimization. Second, we apply the framework to synthesize prior research on parallel optimization in OR, focusing on computational studies published in the period 2008-2017. Finally, we suggest research directions for parallel optimization in OR.