Fig 1 - uploaded by Takeshi Tsuji
Content may be subject to copyright.
Schematic diagram of the hardware implementation of a GPU. The GPU is the computation unit on a GPU board and includes processors and memory. Device memory can be seen outside the GPU (but on the GPU board). The numbers of multiprocessors and processors are as for the NVIDIA Tesla C1060.  

Schematic diagram of the hardware implementation of a GPU. The GPU is the computation unit on a GPU board and includes processors and memory. Device memory can be seen outside the GPU (but on the GPU board). The numbers of multiprocessors and processors are as for the NVIDIA Tesla C1060.  

Source publication
Article
Full-text available
Numerical simulation in exploration geophysics provides important insights into subsurface wave propagation phenomena. Although elastic wave simulations take longer to compute than acoustic simulations, an elastic simulator can construct more realistic wavefields including shear components. Therefore, it is suitable for exploration of the responses...

Similar publications

Preprint
Full-text available
This work explores the feasibility of performing three-dimensional molecular gas dynamics simulations of hypersonic flows such as re-entry vehicles through directly solving the six-dimensional nonlinear Boltzmann equation closed with the BGK (Bhatnagar-Gross-Krook) collision model. Through the combination of high-order unstructured spatial discreti...
Article
Full-text available
The alternating direction method of multipliers (ADMM) is a powerful operator splitting technique for solving structured convex optimization problems. Due to its relatively low per-iteration computational cost and ability to exploit sparsity in the problem data, it is particularly suitable for large-scale optimization. However, the method may still...
Article
Full-text available
Internet of Things (IoT) is becoming a new socioeconomic revolution in which data and immediacy are the main ingredients. IoT generates large datasets on a daily basis but it is currently considered as “dark data”, i.e., data generated but never analyzed. The efficient analysis of this data is mandatory to create intelligent applications for the ne...
Conference Paper
Full-text available
It is unquestionable that successive hardware generations have significantly improved GPU computing workload performance over the last several years. Moore's law and DRAM scaling have respectively increased single-chip peak instruction throughput by 3X and off-chip bandwidth by 2.2X from NVIDIA's GeForce 8800 GTX in November 2006 to its GeForce GTX...

Citations

... A cascade of the absorbing boundary conditions and an exponential-damping attenuation layer was used to remove the effects of artificial boundaries [51]. Considering the calculation requirement brought by the FD algorithm, we applied a parallel computational architecture with a GPU to speed up our elastic wave simulation [52][53][54][55][56]. The forward modeling code was developed based on the work of Weiss and Shragge [57] on Madagascar, an open-source software package for multidimensional seismic data analysis. ...
Article
Full-text available
The demand for deep prospecting has led to an increase in the enthusiasm for seismic techniques in mineral exploration. Reflection seismology applications in the base metal industry have achieved success. For orogenic gold deposits, however, their applicable conditions remain to be investigated. This paper simulated seismic wave propagation based on a finite-difference algorithm with an accuracy of eighth order in space and second order in time to investigate the factors influencing the reflection seismic exploration results. Then, the paper assessed the algorithm’s feasibility for orogenic gold deposits, taking the giant Zaozigou deposit in central China as an example. The forward modeling showed that the petrophysical properties, dimensions, and dip of targets significantly affected the seismic exploration results. In the Zaozigou model, shallowly dipping orebodies were well imaged with precise extension and thickness. Steeply dipping orebodies were recognized but their thickness information was lost. Steeply dipping orebodies at depth were not detectable under a surface configuration. These problems could be effectively solved by increasing the array length and using vertical seismic profiling methods. For small orebodies, multiwave and multicomponent seismic techniques offered more valuable information in terms of mineral exploration. In conclusion, it was possible to locate orogenic gold deposits using the reflection seismology method.
... Micikevicius (2009) and Abdelkhalek et al. (2009) discuss the GPU implementation of finite-difference (FD) algorithms designed to solve the acoustic wave equation. Nakata et al. (2011) present the solutions of the 3D isotropic elastic wave equation on multiple GPUs. Weiss and Shragge (2013) introduce a FD algorithm for modeling elastic wave propagation in anisotropic media and discuss both single-and multi-GPU implementations. ...
... Nowadays, hundreds of scientific applications have been migrated to GPU, so it is impossible to mention all of them, but the most current and representative ones that have been benefited for drastic speed ups in their computing times [17] would be applications for: flows in porous media [5], graph compression [6], MPI combined libraries for image processing [4], query speed up in databases [16], multi-physics modelation [7], solving Boltzmann transport equations [13], CFD code speed up in non-uniform grids [19], direct modeling of gravitational fields [3], reconstructing 3D images [20], propagating acoustic waves [11], solving Lyapunov equations for control theory [8], studies of convective turbulence [2], radiative transport modelling [1], computation of Lagrangian coherent structures [9], just to mention some of them. ...
Chapter
Full-text available
In recent years the increasing necessity to speed up the execution of numerical algorithms has leaded researchers to the use of co-processors and graphic cards such as the NVIDIA GPU’s. Despite CUDA C meta-language was introduced to facilitate the development of general purpose-applications, the solution to the common question: How to allocate (cudaMalloc) two-dimensional array?, is not simple. In this paper, we present a memory structure that allows the use of multidimensional arrays inside a CUDA kernel, to demonstrate its functionality, this structure is applied to the explicit finite difference solution of the non-steady heat transport equation.
... Since a single GPU is small in memory and is unfit for the calculation of RTM SOGs that require large memory space, multiple GPUs allow the expansion of the processor memory so as to effectively solve this problem [23]. ...
Article
Full-text available
As an important method for seismic data processing, reverse time migration (RTM) has high precision but involves high-intensity calculations. The calculation an RTM surface offset (shot–receiver distance) domain gathers provides intermediary data for an iterative calculation of migration and its velocity building. How to generate such data efficiently is of great significance to the industrial application of RTM. We propose a method for the calculation of surface offset gathers (SOGs) based on attribute migration, wherein, using migration calculations performed twice, the attribute profile of the surface offsets can be obtained, thus the image results can be sorted into offset gathers. Aiming at the problem of high-intensity computations required for RTM, we put forth a multi-graphic processing unit (GPU) calculative strategy, i.e., by distributing image computational domains to different GPUs for computation and by using the method of multi-stream calculations to conceal data transmission between GPUs. Ultimately, the computing original efficiency was higher relative to a single GPU, and more GPUs were used linearly. The test with a model showed that the attributive migration methods can correctly output SOGs, while the GPU parallel computation can effectively improve the computing efficiency. Therefore, it is of practical importance for this method to be expanded and applied in industries.
... Алгоритм также был распараллелен на графических GPGPU процессорах NVidia используя технологию CUDA. Данная технология широко применяется для распараллеливания, в том числе явных, вычислительных алгоритмов [11][12][13][14][15]. Потребовалось полное переписывание части расчетного модуля под данную архитектуру [21]. ...
... By allowing hundreds of threads to run concurrently, the GPU delivers a significant speedup for the SIMD-type problems compared to the conventional CPUs (Cheng et al., 2014). Some numerical simulation techniques for seismic wave propagation have been successfully ported to GPUs, for instance, 2D and 3D finite-difference method Michéa and Komatitsch, 2010;Nakata et al., 2011;Weiss and Shragge, 2013), finite-element method (Rietmann et al., 2012) and Galerkin method (Klöckner et al., 2009). Reverse time migration (RTM) has also been parallelized for GPUs (Leader and Clapp, 2012;Ying et al., 2013;Liu et al., 2013). ...
Article
Full-text available
Full waveform inversion (FWI) is a challenging procedure due to the high computational cost related to the modeling, especially for the elastic case. The graphics processing unit (GPU) has become a popular device for the high-performance computing (HPC). To reduce the long computation time, we design and implement the GPU-based 2D elastic FWI (EFWI) in time domain using a single GPU card. We parallelize the forward modeling and gradient calculations using the CUDA programming language. To overcome the limitation of relatively small global memory on GPU, the boundary saving strategy is exploited to reconstruct the forward wavefield. Moreover, the L-BFGS optimization method used in the inversion increases the convergence of the misfit function. A multiscale inversion strategy is performed in the workflow to obtain the accurate inversion results. In our tests, the GPU-based implementations using a single GPU device achieve >15 times speedup in forward modeling, and about 12 times speedup in gradient calculation, compared with the eight-core CPU implementations optimized by OpenMP. The test results from the GPU implementations are verified to have enough accuracy by comparing the results obtained from the CPU implementations
... This problem has derived in two research directions. The first one has tried to use the recent progress in high performance parallel computing, such as those provided by the development of GPUs, [8][9][10][11] as well as multiprocessing coprocessors like Xeon Phi [12][13][14][15][16]. The other branch constitutes the extreme optimisation of current FDTD-CPML algorithms to be able to be successfully executed in current computers. ...
Article
Full-text available
FDTD method opened a fertile research area on the numerical analysis of electromagnetic phenomena under a wide range of media and propagation conditions, providing a richful analysis of electromagnetic behaviour like propagation , reflection, refraction, multi-trajectory phenomena, among others. In this paper we present an optimised FDTD-CPML algorithm, focused in saving memory while increasing the algorithm performance. We particularly implement FDTD-CPML method at high frequency bands, used in several telecommunications applications as well as in nano-electromagnetism. We show an analysis of the performance of the algorithm in single and double precision, as well as an stability of the algorithm analysis, from where we conclude that the implemented CPML ABC constitutes a robust choice in terms of precision and accuracy for the high frequencies herein considered. It is important to recall that the CPML ABC parameters provided in this paper are fixed for the range of operation frequencies tested, from MHz to THz.
... We use 2D acoustic finite-difference numerical modeling (Nakata et al., 2011) to illustrate the benefits of GmRTM. Although we use only a 2D acoustic medium, we can apply GmRTM to 3D and elastic cases as well. ...
Article
Time reversal is a powerful tool used to image directly the location and mechanism of passive seismic sources. his technique assumes seismic velocities in the medium and propagates timereversed observations of ground motion at each receiver location. Assuming an accurate velocity model and adequate array aperture, the waves will focus at the source location. Because we do not know the location and the origin time a priori, we need to scan the entire 4D image (3D in space and 1D in time) to localize the source, which makes time-reversal imaging computationally demanding. We have developed a new approach of time-reversal imaging that reduces the computational cost and the scanning dimensions from 4D to 3D (no time) and increases the spatial resolution of the source image. We first individually extrapolate wavefields at each receiver, and then we crosscorrelate these wavefields (the product in the frequency domain: geometric mean). This crosscorrelation creates another imaging condition, and focusing of the seismic wavefields occurs at the zero time lag of the correlation provided the velocity model is sufficiently accurate. Due to the analogy to the active-shot reverse time migration (RTM), we refer to this technique as the geometric-mean RTM or GmRTM. In addition to reducing the dimension from 4D to 3D compared with conventional time-reversal imaging, the crosscorrelation effectively suppresses the side lobes and yields a spatially high-resolution image of seismic sources. The GmRTM is robust for random and coherent noise because crosscorrelation enhances signal and suppresses noise. An added benefit is that, in contrast to conventional time-reversal imaging, GmRTM has the potential to be used to retrieve velocity information by analyzing time and/or space lags of crosscorrelation, which is similar to what is done in active-source imaging.
... Komatitsch et al. (2010) discuss a GPU-based finiteelement formulation of 3D anisotropic elastic wave propagation. Nakata et al. (2011) present results for solving the 3D isotropic elastic WE on multiple GPUs. These studies present impressive GPU runtimes of roughly one-tenth to one-twentieth of their corresponding multicore CPU-based implementations. ...
... This issue is compounded for 3D anisotropic media because the additional stiffness components (or equally anisotropic parameters) must also be held in memory. Fortunately, this issue can be addressed by parallel computing strategies that use domain decomposition to divide the computation across multiple GPU devices that work in concert through a communication protocol (Micikevicius, 2009;Nakata et al., 2011). ...
Article
Efficiently modeling seismic data sets in complex 3D anisotropic media by solving the 3D elastic wave equation is an important challenge in computational geophysics. Using a stress-stiffness formulation on a regular grid, we tested a 3D finite-difference time-domain solver using a second-order temporal and eighth-order spatial accuracy stencil that leverages the massively parallel architecture of graphics processing units (GPUs) to accelerate the computation of key kernels. The relatively small memory of an individual GPU limits the model domain sizes that can be computed on a single device. To circumvent this constraint and move toward modeling industry-sized 3D anisotropic elastic data sets, we parallelized computation across multiple GPU devices by using domain decomposition and, for each time step, employing an interdevice communication protocol to exchange data values falling within interior boundaries of each subdomain. For two or more GPU devices within a single compute node, we use direct peer-to-peer (i.e., GPU-to-GPU) communication, whereas for networked nodes we employed message-passing interface directives to route data over the network. Our 2D GPU-based anisotropic elastic modeling tests achieved a 10x speedup relative to an OpenMP CPU implementation run on an eight-core machine, whereas our 3D tests using dual-GPU devices produced up to a 28x speedup. The performance boost afforded by the GPU architecture allowed us to model seismic data for 3D anisotropic elastic models at lower hardware cost and in less time than has been previously possible.
... However, since the architecture of the GPU is different to that of a conventional CPU, the programming paradigm should be changed. This had led to the development of a new research field within scientific computing which explores the performance of the GPU to general purpose applications, such as acoustic simulation[3], propagation of seismic waves[4], seismic migration[5], molecular engineering[6], fluid dynamics[7], even for astrophysical simulations[8]and many other implementations. In a few words, the objective of the general purpose computing in GPU (GPGPU) is to develop new applications for those who pretend to solve problems of numerical simulation requiring as less computing time as possible. ...
Article
Full-text available
An implementation with the CUDA technology in a single and in several graphics processing units (GPUs) is presented for the calculation of the forward modeling of gravitational fields from a tridimensional volumetric ensemble composed by unitary prisms of constant density. We compared the performance results obtained with the GPUs against a previous version coded in OpenMP with MPI, and we analyzed the results on both platforms. Today, the use of GPUs represents a breakthrough in parallel computing, which has led to the development of several applications. Nevertheless, in some applications the decomposition of the tasks is not trivial, as can be appreciated in this paper. Unlike a trivial decomposition of the domain, we proposed to decompose the problem by sets of prisms and use different memory spaces per processing CUDA core, avoiding the performance decay as a result of the constant calls to kernels functions which would be needed in a parallelization by observations points. The design and implementation created are the main contributions of this work, because the parallelization scheme implemented is not trivial. The performance results obtained are comparable to those of a small processing cluster.