Evolution of NVIDIA GPU architectures.

Evolution of NVIDIA GPU architectures.

Source publication
Article
Full-text available
With the technology development of medical industry, processing data to be exploded and computation time also increases due to many factors like 3D, 4D treatment planning, the increasing sophistication of MRI pulse sequences and the growing complexity of algorithms. Graphics processing unit (GPU) addresses these problems and gives the solutions for...

Context in source publication

Context 1
... rapid development of NVIDIA GPU with various architectures is given in Table 1. NVIDIA has introduced its own massively parallel ar- chitecture called compute unified device architecture (CUDA) in 2006 and made the evolution in GPU programming model. ...

Similar publications

Preprint
Full-text available
Graphics Processing Units (GPUs) employ large register files to accommodate all active threads and accelerate context switching. Unfortunately, register files are a scalability bottleneck for future GPUs due to long access latency, high power consumption, and large silicon area provisioning. Prior work proposes hierarchical register file to reduce...
Article
Full-text available
Background and objectives: This study provides a quantitative comparison of images created using gVirtualXray (gVXR) to both Monte Carlo (MC) and real images of clinically realistic phantoms. gVirtualXray is an open-source framework that relies on the Beer-Lambert law to simulate X-ray images in realtime on a graphics processor unit (GPU) using tr...
Article
Full-text available
Deep Packet Inspection (DPI) is necessitated for many networked application systems in order to prevent from cyber threats. The signature based Network Intrusion And Detection System (NIDS) works on packet inspection and pattern matching mechanisms for the detection of malicious content in network traffic.The rapid growth of high speed network in d...
Chapter
Full-text available
This research involves the development of a compute unified device architecture (CUDA) accelerated 2-opt local search algorithm for the traveling salesman problem (TSP). As one of the fundamental mathematical approaches to solving the TSP problem, the time complexity has generally reduced its efficiency, especially for large problem instances. Grap...

Citations

... According to a research did by T. Kalaiselvi, P. Sriramakrishnan, K. Somasundaram, the application of GPUs (Graphics Processing Units) in medical image analysis is crucial due to the growing complexity of medical data and the need for high computational power [8]. In the field of medical image analysis, GPUs play a vital role in various aspects: Image Denoising: Medical images, particularly those from MRI, often suffer from random noise introduced during acquisition, measurement, and transmission. ...
Article
Full-text available
In the ever-evolving realm of technology, Artificial Intelligence (AI) has ushered in a transformative era, reshaping our interactions with digital systems, and expanding the horizons of machine capabilities. At the core of this AI revolution are specialized hardware entities known as AI accelerators. These accelerators, including Graphics Processing Units (GPUs), Field-Programmable Gate Arrays (FPGAs), and Application-Specific Integrated Circuits (ASICs), play a pivotal role in advancing AI applications across diverse domains. This paper delves into these accelerators, offering an in-depth exploration of their unique attributes and application domains. GPUs, initially designed for graphics, have evolved into versatile tools, thanks to their parallel computing prowess and efficient memory utilization. FPGAs, with reconfigurability and low latency, prove valuable in aerospace and neural network implementations, though they come with cost and expertise challenges. ASICs, engineered for specific functions, excel in performance and power efficiency for mass production but require significant time and resources for development. Furthermore, this paper presents practical application analyses, showcasing how these accelerators are effectively deployed in real-world scenarios. With this comprehensive exploration, readers gain a deeper understanding of AI accelerators and their transformative impact on the AI landscape.
... Many studies have explored how well GPUs perform in various applications [22][23][24][25]. These investigations compare the performance and highlight the strength and weaknesses of popular programming platforms such as CUDA C [26][27][28], CUDA Fortran [29][30][31], OpenCL [32][33][34], OpenACC [35,36], OpenMP [37,38] and Python-based compilers and libraries like Numba, CuPy, and Python CUDA [39][40][41][42][43][44]. ...
Article
Full-text available
This paper examines the performance of two popular GPU programming platforms, Numba and CuPy, for Monte Carlo radiation transport calculations. We conducted tests involving random number generation and one-dimensional Monte Carlo radiation transport in plane-parallel geometry on three GPU cards: NVIDIA Tesla A100, Tesla V100, and GeForce RTX3080. We compared Numba and CuPy to each other and our CUDA C implementation. The results show that CUDA C, as expected, has the fastest performance and highest energy efficiency, while Numba offers comparable performance when data movement is minimal. While CuPy offers ease of implementation, it performs slower for compute-heavy tasks.
... On the other hand, the traditional computational Central Processing Unit (CPU) cannot process medical image data quickly enough due to the abrupt increase in data sizes. Rather, the GPU began to emerge as a cutting-edge technology for solving challenging computational issues in the field of medicine [8]. ...
Article
Full-text available
The approach of using more than one processor to compute in order to overcome the complexity of different medical imaging methods that make up an overall job is known as GPU (graphic processing unit)-based parallel processing. It is extremely important for several medical imaging techniques such as image classification, object detection, image segmentation, registration, and content-based image retrieval, since the GPU-based parallel processing approach allows for time-efficient computation by a software, allowing multiple computations to be completed at once. On the other hand, a non-invasive imaging technology that may depict the shape of an anatomy and the biological advancements of the human body is known as magnetic resonance imaging (MRI). Implementing GPU-based parallel processing approaches in brain MRI analysis with medical imaging techniques might be helpful in achieving immediate and timely image capture. Therefore, this extended review (the extension of the IWBBIO2023 conference paper) offers a thorough overview of the literature with an emphasis on the expanding use of GPU-based parallel processing methods for the medical analysis of brain MRIs with the imaging techniques mentioned above, given the need for quicker computation to acquire early and real-time feedback in medicine. Between 2019 and 2023, we examined the articles in the literature matrix that include the tasks, techniques, MRI sequences, and processing results. As a result, the methods discussed in this review demonstrate the advancements achieved until now in minimizing computing runtime as well as the obstacles and problems still to be solved in the future.
... To achieve this goal, many researchers [12][13][14][15][16][17] have exploited different kinds of devices suitable for parallel elaboration and computation when the data size is high. Among these, Graphical Processing Units (GPUs), used in different scientific applications [18,19], represent a suitable technology in the field of medical image processing. ...
Article
Full-text available
Hyperspectral imaging (HSI) has become a very compelling technique in different scientific areas; indeed, many researchers use it in the fields of remote sensing, agriculture, forensics, and medicine. In the latter, HSI plays a crucial role as a diagnostic support and for surgery guidance. However, the computational effort in elaborating hyperspectral data is not trivial. Furthermore, the demand for detecting diseases in a short time is undeniable. In this paper, we take up this challenge by parallelizing three machine-learning methods among those that are the most intensively used: Support Vector Machine (SVM), Random Forest (RF), and eXtreme Gradient Boosting (XGB) algorithms using the Compute Unified Device Architecture (CUDA) to accelerate the classification of hyperspectral skin cancer images. They all showed a good performance in HS image classification, in particular when the size of the dataset is limited, as demonstrated in the literature. We illustrate the parallelization techniques adopted for each approach, highlighting the suitability of Graphical Processing Units (GPUs) to this aim. Experimental results show that parallel SVM and XGB algorithms significantly improve the classification times in comparison with their serial counterparts.
... In recent years, the integration of graphics processing units (GPUs) has proven to be highly effective in reducing execution time and has revolutionized computational efficiency and accelerated processing, particularly in the field of medical imaging [16][17][18]. GPU platforms provide means to accelerate specific computational tasks and algorithms, surpassing the performance of central processing units (CPUs) while retaining a desirable level of flexibility [19]. In our study, we focus on implementing AD on GPUs to achieve enhanced computational speed; additionally, we augment GPU AD with the GPU Levenberg-Marquardt (LM) algorithm to extract object parameters from optical spectral data. ...
... where k is the number of steps from 0 to n d − 1. The matrices calculated under Equation (17) obtain an index of 0. After that, with each doubling of the layer, the index increases by 1. ...
Article
Full-text available
The Adding-Doubling (AD) algorithm is a general analytical solution of the radiative transfer equation (RTE). AD offers a favorable balance between accuracy and computational efficiency, surpassing other RTE solutions, such as Monte Carlo (MC) simulations, in terms of speed while outperforming approximate solutions like the Diffusion Approximation method in accuracy. While AD algorithms have traditionally been implemented on central processing units (CPUs), this study focuses on leveraging the capabilities of graphics processing units (GPUs) to achieve enhanced computational speed. In terms of processing speed, the GPU AD algorithm showed an improvement by a factor of about 5000 to 40,000 compared to the GPU MC method. The optimal number of threads for this algorithm was found to be approximately 3000. To illustrate the utility of the GPU AD algorithm, the Levenberg–Marquardt inverse solution was used to extract object parameters from optical spectral data of human skin under various hemodynamic conditions. With regards to computational efficiency, it took approximately 5 min to process a 220 × 100 × 61 image (x-axis × y-axis × spectral-axis). The development of the GPU AD algorithm presents an advancement in determining tissue properties compared to other RTE solutions. Moreover, the GPU AD method itself holds the potential to expedite machine learning techniques in the analysis of spectral images.
... Recent studies show the improvements that parallel programming can bring in various fields, taking advantage of the graphics cards of personal computers that are increasingly accessible and can be used to run various parallel algorithms. Thus, combining high performances and low-costs, GPU programming is currently used in fields such as molecular dynamics [14], medical imaging [15], financial simulation [24], geoscience simulations [20], fast 2D interpolations [4], graphs theory [6] etc. ...
Article
Advances in the graphics processing unit (GPU) development led to the opportunity for software developers to increase the execution speed for their programs by massive parallelization of the algorithms using GPU programming. NVIDIA company developed an arhitecture for parallel computing named Compute Unified Device Architecture (CUDA) which includes a set of CUDA instructions and the hardware for parallel computing. Computational Physics is an interdisciplinary field which is in continuous progress and which studies, develops and optimizes numerical algorithms and computational techniques for their application in solving various physics problems. Computational Physics has applicability in all sub-branches of physics and related fields such as: biophysics, astrophysics, plasma physics, biomechanics, fluid physics, etc. Moreover, with the evolution of technology in the last few decades, this relatively new field has helped to quickly obtain results in these fields, facilitating the connection between theoretical and experimental physics. In this paper, some of the latest researches and results obtained in computational physics by using GPU computing with CUDA architecture are reviewed.
... Images obtained through camera systems cannot be processed precisely with a traditional Central Processing Unit (CPU) with a limited number of cores (one, two, four or eight). Because the CPUs having traditional multiple cores cannot process the huge data enough [12]. Because, on the CPU with a single core, the threads are executed by the operating system in a time-sharing manner according to their priorities and the situation. ...
Article
Full-text available
The Coronavirus disease, which emerged in Wuhan, China in December 2019 and spread rapidly all over the world, infected healthy people by being transmitted by small droplets. Medical experts have stated that the most effective fight against the Coronavirus disease is the need for people in contact to wear masks. Despite this, some people violated the obligation to wear masks. In this study, mask detection performances of pre-trained Convolutional Neural Network (CNN) models such as NasNetMobile, MobileNetV3Small, ResNet50, DenseNet121 and EfficientNetV2B0, which were previously trained, were evaluated in order to automatically detect people who violate the mask wearing obligation. At the end of this evaluation, DenseNet121 architechture has become the most successful model. This model has been tested with the image obtained from the camera on a robotic system with six Degrees of Freedom (6-DOF). The human face images taken from the camera were processed using the Jetson Xavier NX development board. As a result, this study will help the officers who carry out mask inspections in public areas and will significantly reduce the spread of new outbreaks similar to the Coronavirus.
... Data-race freedom considers the interaction between every pair of threads among a fixed number of threads. Guaranteeing the absence of data-races is especially important when GPU programs are used in critical software (e.g., self-driving cars [5] and medical imaging [6]). Indeed, in such settings, bugs must be found before the code is executed by customers to avoid potentially catastrophic consequences. ...
Article
Full-text available
GPUs offer parallelism as a commodity, but they are difficult to program correctly. Static analyzers that guarantee data-race freedom (DRF) are essential to help programmers establish the correctness of their programs (kernels). However, existing approaches produce too many false alarms and struggle to handle larger programs. To address these limitations we formalize a novel compositional analysis for DRF, based on memory access protocols. These protocols are behavioral types that codify the way threads interact over shared memory. Our work includes fully mechanized proofs of our theoretical results, the first mechanized proofs in the field of DRF analysis for GPU kernels. Our theory is implemented in Faial, a tool that outperforms the state-of-the-art. Notably, it can correctly verify at least 1.42×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1.42\times $$\end{document} more real-world kernels, and it exhibits a linear growth in 4 out of 5 experiments, while others grow exponentially in all 5 experiments.
... Although GPUs were originally designed for image rendering and animation, their architecture with many processing cores and high throughput has made them useful for accelerating other applications such as graphic modeling and simulation, general computing, and AI [40][41][42]. The parallel processing capabilities of GPUs can significantly increase the processing speed for tasks that can be divided into smaller sub-tasks. ...
... computing, and AI [40][41][42]. The parallel processing capabilities of GPUs can significantly increase the processing speed for tasks that can be divided into smaller sub-tasks. ...
Article
Full-text available
This paper proposes an efficient approach for simulating volumetric deformable objects using the Position-Based Dynamics (PBD) method. Volumetric bodies generated by TetGen are used to represent three-dimensional objects, which accurately capture complex shapes and volumes. However, when a large number of constraints are applied to the system to solve using serialized algorithms on central processing units (CPU), the computational cost can become a bottleneck of the simulation. To address this issue, the proposed implementation algorithm takes advantage of graphic processing unit (GPU) acceleration and parallel processing to improve the efficiency of the simulation. We propose two specific contributions: firstly, the use of the PBD method with volume constraint for tetrahedral elements to simulate volumetric deformable objects realistically; secondly, an efficient GPU-accelerated algorithm for implementing the PBD method that significantly improves computational efficiency. We also applied the node-centric and constraint-centric algorithms to solve the stretch constraint in the GPU-based algorithm. The implementation was performed using Unity3D. The compute shader feature of Unity3D was utilized to perform thousands of parallel computations in a single pass, making it possible to simulate large and complex objects in real-time. The performance of the simulation can be accelerated by using GPU-based methods with stretch and bending constraints, which provides significant speedup factors compared to using only the CPU for deformable objects such as Bunny, Armadillo, and Dragon. The constraint-centric and node-centric GPU approaches provide speedup factors of up to 8.9x and 8x, respectively, while the GPU-based methods with all types of constraints exhibit a slight decrease but still operate at real-time speeds. Overall, this approach enables the simulation of complex and irregular shapes with plausible and realistic results, while also achieving speed, robustness, and flexibility. Additionally, the proposed approach can be applied to general simulation and other game engines that support GPU-based acceleration.
... Image segmentation, also called tagging, is the process of dividing an individual element of an image into a set of groups so that all the elements in a group have a common feature. In medicine, this common feature is usually that the elements belong to the same type of tissue or organ [19][20][21][22][23]. Different algorithms and methods can be used for segmentation. ...