Schematic diagram of the hardware implementation of a GPU. The GPU is the computation unit on a GPU board and includes processors and memory. Device memory can be seen outside the GPU (but on the GPU board). The numbers of multiprocessors and processors are as for the NVIDIA Tesla C1060.

Source publication

Acceleration of computation speed for elastic wave simulation using a Graphic Processing Unit

Article

Full-text available

Mar 2011

Numerical simulation in exploration geophysics provides important insights into subsurface wave propagation phenomena. Although elastic wave simulations take longer to compute than acoustic simulations, an elastic simulator can construct more realistic wavefields including shear components. Therefore, it is suitable for exploration of the responses...

Direct molecular gas dynamics simulations of re-entry vehicles via the Boltzmann equation

Preprint

Full-text available

Dec 2023

This work explores the feasibility of performing three-dimensional molecular gas dynamics simulations of hypersonic flows such as re-entry vehicles through directly solving the six-dimensional nonlinear Boltzmann equation closed with the BGK (Bhatnagar-Gross-Krook) collision model. Through the combination of high-order unstructured spatial discreti...

Fig. 2. Numerical comparison of OSQP and cuOSQP for problem classes...

Fig. 3. Numerical comparison of OSQP and cuOSQP for problem classes...

Fig. 4. Number of ADMM iterations needed to reach a termination...

Fig. 5. Numerical comparison of cuOSQP compiled with single-and...

GPU acceleration of ADMM for large-scale quadratic programming

Article

Full-text available

Jun 2020

The alternating direction method of multipliers (ADMM) is a powerful operator splitting technique for solving structured convex optimization problems. Due to its relatively low per-iteration computational cost and ability to exploit sparsity in the problem data, it is particularly suitable for large-scale optimization. However, the method may still...

Figure 1. The system infrastructure in a nutshell.

Figure 2. Execution time (in seconds) of the clustering of the k-means...

Figure 4. Execution time (in seconds) for the three benchmarks for...

Figure 6. Energy consumption (in KWh) evaluation of the HPC and edge...

Main features of the targeted clustering algorithms.

Evaluation of Clustering Algorithms on GPU-Based Edge Computing Platforms

Article

Full-text available

Nov 2020

Internet of Things (IoT) is becoming a new socioeconomic revolution in which data and immediacy are the main ingredients. IoT generates large datasets on a daily basis but it is currently considered as “dark data”, i.e., data generated but never analyzed. The efficient analysis of this data is mandatory to create intelligent applications for the ne...

Optimization and architecture effects on GPU computing workload performance

Conference Paper

Full-text available

May 2012

It is unquestionable that successive hardware generations have significantly improved GPU computing workload performance over the last several years. Moore's law and DRAM scaling have respectively increased single-chip peak instruction throughput by 3X and off-chip bandwidth by 2.2X from NVIDIA's GeForce 8800 GTX in November 2006 to its GeForce GTX...

Seismic Wave Finite-Difference Forward Modeling for Orogenic Gold Deposits

Article

Full-text available

Nov 2022

The demand for deep prospecting has led to an increase in the enthusiasm for seismic techniques in mineral exploration. Reflection seismology applications in the base metal industry have achieved success. For orogenic gold deposits, however, their applicable conditions remain to be investigated. This paper simulated seismic wave propagation based on a finite-difference algorithm with an accuracy of eighth order in space and second order in time to investigate the factors influencing the reflection seismic exploration results. Then, the paper assessed the algorithm’s feasibility for orogenic gold deposits, taking the giant Zaozigou deposit in central China as an example. The forward modeling showed that the petrophysical properties, dimensions, and dip of targets significantly affected the seismic exploration results. In the Zaozigou model, shallowly dipping orebodies were well imaged with precise extension and thickness. Steeply dipping orebodies were recognized but their thickness information was lost. Steeply dipping orebodies at depth were not detectable under a surface configuration. These problems could be effectively solved by increasing the array length and using vertical seismic profiling methods. For small orebodies, multiwave and multicomponent seismic techniques offered more valuable information in terms of mineral exploration. In conclusion, it was possible to locate orogenic gold deposits using the reflection seismology method.

GPU-based 3D anisotropic elastic modeling using mimetic finite differences

Conference Paper

Aug 2022

Solving the Heat Transfer Equation by a Finite Difference Method Using Multi-dimensional Arrays in CUDA as in Standard C

Chapter

Full-text available

Apr 2022

In recent years the increasing necessity to speed up the execution of numerical algorithms has leaded researchers to the use of co-processors and graphic cards such as the NVIDIA GPU’s. Despite CUDA C meta-language was introduced to facilitate the development of general purpose-applications, the solution to the common question: How to allocate (cudaMalloc) two-dimensional array?, is not simple. In this paper, we present a memory structure that allows the use of multidimensional arrays inside a CUDA kernel, to demonstrate its functionality, this structure is applied to the explicit finite difference solution of the non-steady heat transport equation.

Calculation of Surface Offset Gathers Based on Reverse Time Migration and Its Parallel Computation with Multi-GPUs

Article

Full-text available

Nov 2021

As an important method for seismic data processing, reverse time migration (RTM) has high precision but involves high-intensity calculations. The calculation an RTM surface offset (shot–receiver distance) domain gathers provides intermediary data for an iterative calculation of migration and its velocity building. How to generate such data efficiently is of great significance to the industrial application of RTM. We propose a method for the calculation of surface offset gathers (SOGs) based on attribute migration, wherein, using migration calculations performed twice, the attribute profile of the surface offsets can be obtained, thus the image results can be sorted into offset gathers. Aiming at the problem of high-intensity computations required for RTM, we put forth a multi-graphic processing unit (GPU) calculative strategy, i.e., by distributing image computational domains to different GPUs for computation and by using the method of multi-stream calculations to conceal data transmission between GPUs. Ultimately, the computing original efficiency was higher relative to a single GPU, and more GPUs were used linearly. The test with a model showed that the attributive migration methods can correctly output SOGs, while the GPU parallel computation can effectively improve the computing efficiency. Therefore, it is of practical importance for this method to be expanded and applied in industries.

Application of the grid-characteristic method for solving the problems of the propagation of dynamic wave disturbances in high-performance computing systems

Article

Full-text available

Jan 2019

Acceleration for 2D time-domain elastic full waveform inversion using a single GPU card

Article

Full-text available

Sep 2018

Jiang Jinpeng

Full waveform inversion (FWI) is a challenging procedure due to the high computational cost related to the modeling, especially for the elastic case. The graphics processing unit (GPU) has become a popular device for the high-performance computing (HPC). To reduce the long computation time, we design and implement the GPU-based 2D elastic FWI (EFWI) in time domain using a single GPU card. We parallelize the forward modeling and gradient calculations using the CUDA programming language. To overcome the limitation of relatively small global memory on GPU, the boundary saving strategy is exploited to reconstruct the forward wavefield. Moreover, the L-BFGS optimization method used in the inversion increases the convergence of the misfit function. A multiscale inversion strategy is performed in the workflow to obtain the accurate inversion results. In our tests, the GPU-based implementations using a single GPU device achieve >15 times speedup in forward modeling, and about 12 times speedup in gradient calculation, compared with the eight-core CPU implementations optimized by OpenMP. The test results from the GPU implementations are verified to have enough accuracy by comparing the results obtained from the CPU implementations

Analysis of Electromagnetic Propagation from MHz to THz with a memory-optimised CPML-FDTD algorithm

Article

Full-text available

Apr 2018
Int J Antenn Propag

FDTD method opened a fertile research area on the numerical analysis of electromagnetic phenomena under a wide range of media and propagation conditions, providing a richful analysis of electromagnetic behaviour like propagation , reflection, refraction, multi-trajectory phenomena, among others. In this paper we present an optimised FDTD-CPML algorithm, focused in saving memory while increasing the algorithm performance. We particularly implement FDTD-CPML method at high frequency bands, used in several telecommunications applications as well as in nano-electromagnetism. We show an analysis of the performance of the algorithm in single and double precision, as well as an stability of the algorithm analysis, from where we conclude that the implemented CPML ABC constitutes a robust choice in terms of precision and accuracy for the high frequencies herein considered. It is important to recall that the CPML ABC parameters provided in this paper are fixed for the range of operation frequencies tested, from MHz to THz.

Reverse-Time Migration for Microseismic Sources Using the Geometric Mean as an Imaging Condition

Article

Feb 2016
GEOPHYSICS

Time reversal is a powerful tool used to image directly the location and mechanism of passive seismic sources. his technique assumes seismic velocities in the medium and propagates timereversed observations of ground motion at each receiver location. Assuming an accurate velocity model and adequate array aperture, the waves will focus at the source location. Because we do not know the location and the origin time a priori, we need to scan the entire 4D image (3D in space and 1D in time) to localize the source, which makes time-reversal imaging computationally demanding. We have developed a new approach of time-reversal imaging that reduces the computational cost and the scanning dimensions from 4D to 3D (no time) and increases the spatial resolution of the source image. We first individually extrapolate wavefields at each receiver, and then we crosscorrelate these wavefields (the product in the frequency domain: geometric mean). This crosscorrelation creates another imaging condition, and focusing of the seismic wavefields occurs at the zero time lag of the correlation provided the velocity model is sufficiently accurate. Due to the analogy to the active-shot reverse time migration (RTM), we refer to this technique as the geometric-mean RTM or GmRTM. In addition to reducing the dimension from 4D to 3D compared with conventional time-reversal imaging, the crosscorrelation effectively suppresses the side lobes and yields a spatially high-resolution image of seismic sources. The GmRTM is robust for random and coherent noise because crosscorrelation enhances signal and suppresses noise. An added benefit is that, in contrast to conventional time-reversal imaging, GmRTM has the potential to be used to retrieve velocity information by analyzing time and/or space lags of crosscorrelation, which is similar to what is done in active-source imaging.

Solving 3D anisotropic elastic wave equations on parallel GPU devices

Article

Feb 2013
GEOPHYSICS

Efficiently modeling seismic data sets in complex 3D anisotropic media by solving the 3D elastic wave equation is an important challenge in computational geophysics. Using a stress-stiffness formulation on a regular grid, we tested a 3D finite-difference time-domain solver using a second-order temporal and eighth-order spatial accuracy stencil that leverages the massively parallel architecture of graphics processing units (GPUs) to accelerate the computation of key kernels. The relatively small memory of an individual GPU limits the model domain sizes that can be computed on a single device. To circumvent this constraint and move toward modeling industry-sized 3D anisotropic elastic data sets, we parallelized computation across multiple GPU devices by using domain decomposition and, for each time step, employing an interdevice communication protocol to exchange data values falling within interior boundaries of each subdomain. For two or more GPU devices within a single compute node, we use direct peer-to-peer (i.e., GPU-to-GPU) communication, whereas for networked nodes we employed message-passing interface directives to route data over the network. Our 2D GPU-based anisotropic elastic modeling tests achieved a 10x speedup relative to an OpenMP CPU implementation run on an eight-core machine, whereas our 3D tests using dual-GPU devices produced up to a 28x speedup. The performance boost afforded by the GPU architecture allowed us to model seismic data for 3D anisotropic elastic models at lower hardware cost and in less time than has been previously possible.

TESLA GPUs versus MPI with OpenMP for the forward modeling of gravity and gravity gradient of large prisms ensemble

Article

Full-text available

Jan 2013
J Appl Math

An implementation with the CUDA technology in a single and in several graphics processing units (GPUs) is presented for the calculation of the forward modeling of gravitational fields from a tridimensional volumetric ensemble composed by unitary prisms of constant density. We compared the performance results obtained with the GPUs against a previous version coded in OpenMP with MPI, and we analyzed the results on both platforms. Today, the use of GPUs represents a breakthrough in parallel computing, which has led to the development of several applications. Nevertheless, in some applications the decomposition of the tasks is not trivial, as can be appreciated in this paper. Unlike a trivial decomposition of the domain, we proposed to decompose the problem by sets of prisms and use different memory spaces per processing CUDA core, avoiding the performance decay as a result of the constant calls to kernels functions which would be needed in a parallelization by observations points. The design and implementation created are the main contributions of this work, because the parallelization scheme implemented is not trivial. The performance results obtained are comparable to those of a small processing cluster.

Schematic diagram of the hardware implementation of a GPU. The GPU is the computation unit on a GPU board and includes processors and memory. Device memory can be seen outside the GPU (but on the GPU board). The numbers of multiprocessors and processors are as for the NVIDIA Tesla C1060.

Similar publications

Citations