Evolution of NVIDIA GPU architectures.

Source publication

Survey of using GPU CUDA programming model in medical image analysis

Article

Full-text available

Aug 2017

With the technology development of medical industry, processing data to be exploded and computation time also increases due to many factors like 3D, 4D treatment planning, the increasing sophistication of MRI pulse sequences and the growing complexity of algorithms. Graphics processing unit (GPU) addresses these problems and gives the solutions for...

Context 1

... rapid development of NVIDIA GPU with various architectures is given in Table 1. NVIDIA has introduced its own massively parallel ar- chitecture called compute unified device architecture (CUDA) in 2006 and made the evolution in GPU programming model. ...

View in full-text

Enabling High-Capacity, Latency-Tolerant, and Highly-Concurrent GPU Register Files via Software/Hardware Cooperation

Preprint

Full-text available

Oct 2020

Graphics Processing Units (GPUs) employ large register files to accommodate all active threads and accelerate context switching. Unfortunately, register files are a scalability bottleneck for future GPUs due to long access latency, high power consumption, and large silicon area provisioning. Prior work proposes hierarchical register file to reduce...

Simulation of X-ray projections on GPU: Benchmarking gVirtualXray with clinically realistic phantoms

Article

Full-text available

Mar 2023

Background and objectives: This study provides a quantitative comparison of images created using gVirtualXray (gVXR) to both Monte Carlo (MC) and real images of clinically realistic phantoms. gVirtualXray is an open-source framework that relies on the Beer-Lambert law to simulate X-ray images in realtime on a graphics processor unit (GPU) using tr...

GDPI: Signature based Deep Packet Inspection using GPUs

Article

Full-text available

Dec 2017

Deep Packet Inspection (DPI) is necessitated for many networked application systems in order to prevent from cyber threats. The signature based Network Intrusion And Detection System (NIDS) works on packet inspection and pattern matching mechanisms for the detection of malicious content in network traffic.The rapid growth of high speed network in d...

Figure 1. CUDA-based 2-opt memory layout.

Figure 2. CUDA-based 2-opt, mapping of blocks to tasks.

Figure 5. Figure for the experiments of Intel i7 and NVidia GTX 1050 on...

Figure 6. Figure for the experiments of Intel i7-7800X and NVidia titan...

Figure 7. Figure for the experiments of Intel i7-7800X and NVidia titan...

CUDA Accelerated 2-OPT Local Search for the Traveling Salesman Problem

Chapter

Full-text available

Sep 2020

This research involves the development of a compute unified device architecture (CUDA) accelerated 2-opt local search algorithm for the traveling salesman problem (TSP). As one of the fundamental mathematical approaches to solving the TSP problem, the time complexity has generally reduced its efficiency, especially for large problem instances. Grap...

Optimizing GPU programs by partial evaluation

Conference Paper

Full-text available

Feb 2020

Unveiling the powerhouses of AI: A comprehensive study of GPU, FPGA, and ASIC accelerators

Article

Full-text available

Mar 2024

Yicheng Shi

In the ever-evolving realm of technology, Artificial Intelligence (AI) has ushered in a transformative era, reshaping our interactions with digital systems, and expanding the horizons of machine capabilities. At the core of this AI revolution are specialized hardware entities known as AI accelerators. These accelerators, including Graphics Processing Units (GPUs), Field-Programmable Gate Arrays (FPGAs), and Application-Specific Integrated Circuits (ASICs), play a pivotal role in advancing AI applications across diverse domains. This paper delves into these accelerators, offering an in-depth exploration of their unique attributes and application domains. GPUs, initially designed for graphics, have evolved into versatile tools, thanks to their parallel computing prowess and efficient memory utilization. FPGAs, with reconfigurability and low latency, prove valuable in aerospace and neural network implementations, though they come with cost and expertise challenges. ASICs, engineered for specific functions, excel in performance and power efficiency for mass production but require significant time and resources for development. Furthermore, this paper presents practical application analyses, showcasing how these accelerators are effectively deployed in real-world scenarios. With this comprehensive exploration, readers gain a deeper understanding of AI accelerators and their transformative impact on the AI landscape.

Exploring Numba and CuPy for GPU-Accelerated Monte Carlo Radiation Transport

Article

Full-text available

Mar 2024

This paper examines the performance of two popular GPU programming platforms, Numba and CuPy, for Monte Carlo radiation transport calculations. We conducted tests involving random number generation and one-dimensional Monte Carlo radiation transport in plane-parallel geometry on three GPU cards: NVIDIA Tesla A100, Tesla V100, and GeForce RTX3080. We compared Numba and CuPy to each other and our CUDA C implementation. The results show that CUDA C, as expected, has the fastest performance and highest energy efficiency, while Numba offers comparable performance when data movement is minimal. While CuPy offers ease of implementation, it performs slower for compute-heavy tasks.

GPU-Based Parallel Processing Techniques for Enhanced Brain Magnetic Resonance Imaging Analysis: A Review of Recent Advances

Article

Full-text available

Feb 2024
SENSORS-BASEL

The approach of using more than one processor to compute in order to overcome the complexity of different medical imaging methods that make up an overall job is known as GPU (graphic processing unit)-based parallel processing. It is extremely important for several medical imaging techniques such as image classification, object detection, image segmentation, registration, and content-based image retrieval, since the GPU-based parallel processing approach allows for time-efficient computation by a software, allowing multiple computations to be completed at once. On the other hand, a non-invasive imaging technology that may depict the shape of an anatomy and the biological advancements of the human body is known as magnetic resonance imaging (MRI). Implementing GPU-based parallel processing approaches in brain MRI analysis with medical imaging techniques might be helpful in achieving immediate and timely image capture. Therefore, this extended review (the extension of the IWBBIO2023 conference paper) offers a thorough overview of the literature with an emphasis on the expanding use of GPU-based parallel processing methods for the medical analysis of brain MRIs with the imaging techniques mentioned above, given the need for quicker computation to acquire early and real-time feedback in medicine. Between 2019 and 2023, we examined the articles in the literature matrix that include the tasks, techniques, MRI sequences, and processing results. As a result, the methods discussed in this review demonstrate the advancements achieved until now in minimizing computing runtime as well as the obstacles and problems still to be solved in the future.

Acceleration of Hyperspectral Skin Cancer Image Classification through Parallel Machine-Learning Methods

Article

Full-text available

Feb 2024
SENSORS-BASEL

Hyperspectral imaging (HSI) has become a very compelling technique in different scientific areas; indeed, many researchers use it in the fields of remote sensing, agriculture, forensics, and medicine. In the latter, HSI plays a crucial role as a diagnostic support and for surgery guidance. However, the computational effort in elaborating hyperspectral data is not trivial. Furthermore, the demand for detecting diseases in a short time is undeniable. In this paper, we take up this challenge by parallelizing three machine-learning methods among those that are the most intensively used: Support Vector Machine (SVM), Random Forest (RF), and eXtreme Gradient Boosting (XGB) algorithms using the Compute Unified Device Architecture (CUDA) to accelerate the classification of hyperspectral skin cancer images. They all showed a good performance in HS image classification, in particular when the size of the dataset is limited, as demonstrated in the literature. We illustrate the parallelization techniques adopted for each approach, highlighting the suitability of Graphical Processing Units (GPUs) to this aim. Experimental results show that parallel SVM and XGB algorithms significantly improve the classification times in comparison with their serial counterparts.

GPU Adding-Doubling Algorithm for Analysis of Optical Spectral Images

Article

Full-text available

Feb 2024

The Adding-Doubling (AD) algorithm is a general analytical solution of the radiative transfer equation (RTE). AD offers a favorable balance between accuracy and computational efficiency, surpassing other RTE solutions, such as Monte Carlo (MC) simulations, in terms of speed while outperforming approximate solutions like the Diffusion Approximation method in accuracy. While AD algorithms have traditionally been implemented on central processing units (CPUs), this study focuses on leveraging the capabilities of graphics processing units (GPUs) to achieve enhanced computational speed. In terms of processing speed, the GPU AD algorithm showed an improvement by a factor of about 5000 to 40,000 compared to the GPU MC method. The optimal number of threads for this algorithm was found to be approximately 3000. To illustrate the utility of the GPU AD algorithm, the Levenberg–Marquardt inverse solution was used to extract object parameters from optical spectral data of human skin under various hemodynamic conditions. With regards to computational efficiency, it took approximately 5 min to process a 220 × 100 × 61 image (x-axis × y-axis × spectral-axis). The development of the GPU AD algorithm presents an advancement in determining tissue properties compared to other RTE solutions. Moreover, the GPU AD method itself holds the potential to expedite machine learning techniques in the analysis of spectral images.

Advances in CUDA for computational physics

Article

Dec 2023

Delia Spiridon

Advances in the graphics processing unit (GPU) development led to the opportunity for software developers to increase the execution speed for their programs by massive parallelization of the algorithms using GPU programming. NVIDIA company developed an arhitecture for parallel computing named Compute Unified Device Architecture (CUDA) which includes a set of CUDA instructions and the hardware for parallel computing. Computational Physics is an interdisciplinary field which is in continuous progress and which studies, develops and optimizes numerical algorithms and computational techniques for their application in solving various physics problems. Computational Physics has applicability in all sub-branches of physics and related fields such as: biophysics, astrophysics, plasma physics, biomechanics, fluid physics, etc. Moreover, with the evolution of technology in the last few decades, this relatively new field has helped to quickly obtain results in these fields, facilitating the connection between theoretical and experimental physics. In this paper, some of the latest researches and results obtained in computational physics by using GPU computing with CUDA architecture are reviewed.

Robotic based mask detection to prevent epidemic diseases transmitted through droplets using pre-trained deep learning models

Article

Full-text available

Jul 2023

Ali Unluturk

The Coronavirus disease, which emerged in Wuhan, China in December 2019 and spread rapidly all over the world, infected healthy people by being transmitted by small droplets. Medical experts have stated that the most effective fight against the Coronavirus disease is the need for people in contact to wear masks. Despite this, some people violated the obligation to wear masks. In this study, mask detection performances of pre-trained Convolutional Neural Network (CNN) models such as NasNetMobile, MobileNetV3Small, ResNet50, DenseNet121 and EfficientNetV2B0, which were previously trained, were evaluated in order to automatically detect people who violate the mask wearing obligation. At the end of this evaluation, DenseNet121 architechture has become the most successful model. This model has been tested with the image obtained from the camera on a robotic system with six Degrees of Freedom (6-DOF). The human face images taken from the camera were processed using the Jetson Xavier NX development board. As a result, this study will help the officers who carry out mask inspections in public areas and will significantly reduce the spread of new outbreaks similar to the Coronavirus.

Memory access protocols: certified data-race freedom for GPU kernels

Article

Full-text available

May 2023
FORM METHOD SYST DES

GPUs offer parallelism as a commodity, but they are difficult to program correctly. Static analyzers that guarantee data-race freedom (DRF) are essential to help programmers establish the correctness of their programs (kernels). However, existing approaches produce too many false alarms and struggle to handle larger programs. To address these limitations we formalize a novel compositional analysis for DRF, based on memory access protocols. These protocols are behavioral types that codify the way threads interact over shared memory. Our work includes fully mechanized proofs of our theoretical results, the first mechanized proofs in the field of DRF analysis for GPU kernels. Our theory is implemented in Faial, a tool that outperforms the state-of-the-art. Notably, it can correctly verify at least 1.42×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1.42\times $$\end{document} more real-world kernels, and it exhibits a linear growth in 4 out of 5 experiments, while others grow exponentially in all 5 experiments.

Efficient Simulation of Volumetric Deformable Objects in Unity3D: GPU-Accelerated Position-Based Dynamics

Article

Full-text available

May 2023

This paper proposes an efficient approach for simulating volumetric deformable objects using the Position-Based Dynamics (PBD) method. Volumetric bodies generated by TetGen are used to represent three-dimensional objects, which accurately capture complex shapes and volumes. However, when a large number of constraints are applied to the system to solve using serialized algorithms on central processing units (CPU), the computational cost can become a bottleneck of the simulation. To address this issue, the proposed implementation algorithm takes advantage of graphic processing unit (GPU) acceleration and parallel processing to improve the efficiency of the simulation. We propose two specific contributions: firstly, the use of the PBD method with volume constraint for tetrahedral elements to simulate volumetric deformable objects realistically; secondly, an efficient GPU-accelerated algorithm for implementing the PBD method that significantly improves computational efficiency. We also applied the node-centric and constraint-centric algorithms to solve the stretch constraint in the GPU-based algorithm. The implementation was performed using Unity3D. The compute shader feature of Unity3D was utilized to perform thousands of parallel computations in a single pass, making it possible to simulate large and complex objects in real-time. The performance of the simulation can be accelerated by using GPU-based methods with stretch and bending constraints, which provides significant speedup factors compared to using only the CPU for deformable objects such as Bunny, Armadillo, and Dragon. The constraint-centric and node-centric GPU approaches provide speedup factors of up to 8.9x and 8x, respectively, while the GPU-based methods with all types of constraints exhibit a slight decrease but still operate at real-time speeds. Overall, this approach enables the simulation of complex and irregular shapes with plausible and realistic results, while also achieving speed, robustness, and flexibility. Additionally, the proposed approach can be applied to general simulation and other game engines that support GPU-based acceleration.

Analysis of Medical Slide Images Processing using Depth Learning in Histopathological Studies of Cerebellar Cortex Tissue

Article

Full-text available

Jan 2023

Evolution of NVIDIA GPU architectures.

Context in source publication

Similar publications

Citations