Hardware architecture of the CPU and GPU [13].

Hardware architecture of the CPU and GPU [13].

Source publication
Article
Full-text available
In this study, a CUDA Fortran-based GPU-accelerated Laplace equation model was developed and applied to several cases. The Laplace equation is one of the equations that can physically analyze the groundwater flows, and is an equation that can provide analytical solutions. Such a numerical model requires a large amount of data to physically regenera...

Citations

... The evaluation results have shown the improved performance of GPU compared to CPU in terms of the execution time of moving person detection on different video bases. A method of CUDA Fortran-based GPU accelerated Laplace equation model has been implemented to reduce the computational time (Kim, Yoon, & Kim, 2021). The work flow of CUDA data processing is shown in Figure 4. ...
Article
Full-text available
GPUs (Graphics Processing Units) are widely used due to their impressive computational power and parallel computing ability.It have shown significant potential in improving the performance of HPC applications. This is due to their highly parallel architecture, which allows for the execution of multiple tasks simultaneously. However, GPU computing is synonymous with CUDA in providing applications for GPU devices. This offers enhanced development tools and comprehensive documentation to increase performance, while AMD’s ROCm platform features an application programming interface compatible with CUDA. Hence, the main objective of the systematic literature review is to thoroughly analyze and compute the performance characteristics of two prominent GPU computing frameworks, namely NVIDIA's CUDA and AMD's ROCm (Radeon Open Compute). By meticulously examining the strengths, weaknesses, and overall performance capabilities of CUDA and ROCm, a deeper understanding of these concepts is gained and will benefit researchers. The purpose of the research on GPU accelerated HPC is to provide a comprehensive and unbiased overview of the current state of research and development in this area. It can help researchers, practitioners, and policymakers understand the role of GPUs in HPC and facilitate evidence-based decision making. In addition, different real-time applications of CUDA and ROCm platforms are also discussed to explore potential performance benefits and trade-offs in leveraging these techniques. The insights provided by the study will empower the way to make well-informed decisions when choosing between CUDA and ROCm approaches that apply to real-world software.
... Since the beginning of this century, the graphics processing unit (GPU) due to its powerful parallel processing capability has received increasing attention. The introduction of the compute unified device architecture (CUDA) programming model by NVIDIA, makes the GPU available to do parallel computing with the general purpose [7]. ...
Article
Full-text available
The range migration algorithm (RMA) based on Fourier transformation is widely applied in millimeter-wave (MMW) close-range imaging because of its few operations and small approximation. However, its interpolation stage is not effective due to the involved intensive logic controls, which limits the speed performance in a graphics processing unit (GPU) platform. Therefore, in this paper, we present an acceleration optimization method based on the hybrid GPU and central processing unit (CPU) parallel computation for implementing the RMA. The proposed method exploits the strong logic-control capability of the CPU to assist the GPU in processing the logic controls of the interpolation stage. The common positions of wavenumber-domain components to be interpolated are calculated by the CPU and stored in the constant memory for broadcast at any time. This avoids the repetitive computation consumed in a GPU-only scheme. Then the GPU is responsible for the remaining matrix-related steps and outputs the needed wavenumber-domain values. The imaging experiments verify the acceleration efficiency of the proposed method and demonstrate that the speedup ratio of our proposed method is more than 15 times of that by the CPU-only method, and more than 2 times of that by the GPU-only method.