Figure 5 - uploaded by Joachim Pouderoux
Content may be subject to copyright.
Ghost cells are indicated by dashed lines. With one layer of ghost cells, isocontouring produces correct geometry (gray triangle) in the local partition. The geometry in the ghost cells, though, is incorrect (red triangles) because the values at points P7 and P8 are faulty. The normal calculated for vertex V3 will be inaccurate, since the triangles in C6, C7 and C8 are all incorrect. This will result in lighting artifacts when the isocontour is rendered. 

Ghost cells are indicated by dashed lines. With one layer of ghost cells, isocontouring produces correct geometry (gray triangle) in the local partition. The geometry in the ghost cells, though, is incorrect (red triangles) because the values at points P7 and P8 are faulty. The normal calculated for vertex V3 will be inaccurate, since the triangles in C6, C7 and C8 are all incorrect. This will result in lighting artifacts when the isocontour is rendered. 

Source publication
Conference Paper
Full-text available
Ghost cells are important for distributed memory parallel operations that require neighborhood information, and are required for correctness on the boundaries of local data partitions. Ghost cells are one or more layers of grid cells surrounding the external boundary of the local partition, which are owned by other data partitions. They are used by...

Contexts in source publication

Context 1
... single layer of ghost cells supplies enough information for a cell-to-point algorithm over a distributed data set to calculate the correct values for points on the parallel partition boundary. Figure 5 shows the result of using one layer of ghost cells. A topologically correct isocontour is produced with this single layer of ghost cells. ...
Context 2
... the same way an isocontour produces incorrect triangles on the boundary without a first layer of ghost cells, the first layer of ghost cells will produce incorrect triangles without another layer of ghost cells. This problem can be seen in Figure 5. ...

Citations

... The first part of this paper presents the process of generating the domain decomposition with multiple layers of ghost cells in a parallel manner. The ParMETIS [23] library is used to perform the graph partitioning in parallel, and the multiple layers of ghost layers are added with a parallel algorithm derived from [31]. The performance of this generation process is studied in detail for 2-dimensional and 3-dimensional meshes both in terms of execution time and memory. ...
... On regular structured meshes, their computation is quite straightforward [40,26]. On unstructured meshes, however, it becomes much more complex and requires dedicated algorithms [27,11,31]. The number of required ghost-layers depends on the discretization's stencil. ...
... To add multiple layers of ghost cells to each sub-domain, we use an algorithm derived from [31]. While the goal in the original paper was to exchange multiple layers of ghost cells once for visualization purposes in Paraview [1], we want to add the ghost cells to each sub-domain while keeping track of where they came from so that we can perform a memory exchange at each time step of the CFD simulations. ...
Preprint
The resolution of the Shallow-water equations is of practical interest in the study of inundations and often requires very large and dense meshes to accurately simulate river flows. Those large meshes are often decomposed into multiple sub-domains to allow for parallel processing. When such a decomposition process is used in the context of distributed parallel computing, each sub-domain requires an exchange of one or more layers of ghost cells at each time step of the simulation due to the spatial dependency of numerical methods. In the first part of this paper, we show how the domain decomposition and ghost-layer generation process can be performed in a parallel manner for large meshes, and show a new way of storing the resulting sub-domains with all their send/receive information within a single CGNS mesh file. The performance of the ghost-layer generation process is studied both in terms of time and memory on 2D and 3D meshes containing up to 70 million cells. In the second part of the paper, the program developed in the first part is used to generate the domain decomposition of large meshes of practical interest in the study of natural free surface flows. We use our in-house multi-CPU multi-GPU (MPI+CUDA) solver to show the impact of multiple layers of ghost cells on the execution times of the first-order HLLC, and second-order WAF and MUSCL methods. Finally, the parallel solver is used to perform the second-order resolution on real large-scale meshes of rivers near Montr\'eal, using up to 32 GPUs on meshes of 13 million cells.
... The first part of this paper presents the process of generating the domain decomposition with multiple layers of ghost cells in a parallel manner. The ParMETIS [23] library is used to perform the graph partitioning in parallel, and the multiple layers of ghost layers are added with a parallel algorithm derived from [31]. The performance of this generation process is studied in detail for 2-dimensional and 3-dimensional meshes both in terms of execution time and memory. ...
... On regular structured meshes, their computation is quite straightforward [40,26]. On unstructured meshes, however, it becomes much more complex and requires dedicated algorithms [27,11,31]. The number of required ghost-layers depends on the discretization's stencil. ...
... To add multiple layers of ghost cells to each sub-domain, we use an algorithm derived from [31]. While the goal in the original paper was to exchange multiple layers of ghost cells once for visualization purposes in Paraview [1], we want to add the ghost cells to each sub-domain while keeping track of where they came from so that we can perform a memory exchange at each time step of the CFD simulations. ...
Article
The resolution of the Shallow-water equations is of practical interest in the study of inundations and often requires very large and dense meshes to accurately simulate river flows. Those large meshes are often decomposed into multiple sub-domains to allow for parallel processing. When such a decomposition process is used in the context of distributed parallel computing, each sub-domain requires an exchange of one or more layers of ghost cells at each time step of the simulation due to the spatial dependency of numerical methods. In the first part of this paper, we show how the domain decomposition and ghost-layer generation process can be performed in a parallel manner for large meshes, and show a new way of storing the resulting sub-domains with all their send/receive information within a single CGNS mesh file. The performance of the ghost-layer generation process is studied both in terms of time and memory on 2D and 3D meshes containing up to 70 million cells. In the second part of the paper, the program developed in the first part is used to generate the domain decomposition of large meshes of practical interest in the study of natural free surface flows. We use our in-house multi-CPU multi-GPU (MPI+CUDA) solver to show the impact of multiple layers of ghost cells on the execution times of the first-order HLLC, and second-order WAF and MUSCL methods. Finally, the parallel solver is used to perform the second-order resolution on real large-scale meshes of rivers near Montréal, using up to 32 GPUs on meshes of 13 million cells.
... The input data to our algorithm is assumed to be partitioned onto the nodes, so that each node is responsible for a subset of M . We require one layer of ghost cells [37] that is assumed in the input data but could be constructed with one message between all nodes holding adjacent vertices. Within this setting, the neighbors and value of a vertex v are available only on nodes that are responsible for v or any of its neighbors. ...
Article
Full-text available
Contemporary scientific data sets require fast and scalable topological analysis to enable visualization, simplification and interaction. Within this field, parallel merge tree construction has seen abundant recent contributions, with a trend of decentralized, task-parallel or SMP-oriented algorithms dominating in terms of total runtime. However, none of these recent approaches computed complete merge trees on distributed systems, leaving this field to traditional divide and conquer approaches. This paper introduces a scalable, parallel and distributed algorithm for merge tree construction outperforming the previously fastest distributed solution by a factor of around three. This is achieved by a task-parallel identification of individual merge tree arcs by growing regions around critical points in the data, without any need for ordered progression or global data structures, based on a novel insight introducing a sufficient local boundary for region growth.
... As it is only done once for each mesh, this serial domain decomposition is good enough for our purposes. In the future, this decomposition may need to be upgraded by using parallel computing, as in Patchett et al. (2017). ...
Preprint
This paper shows the development of a multi-GPU version of a time-explicit finite volume solver for the Shallow-Water Equations (SWE) on a multi-GPU architecture. MPI is combined with CUDA-Fortran in order to use as many GPUs as needed. The METIS library is leveraged to perform a domain decomposition on the 2D unstructured triangular meshes of interest. A CUDA-Aware OpenMPI version is adopted to speed up the messages between the MPI processes. A study of both speed-up and efficiency is conducted; first, for a classic dam-break flow in a canal, and then for two real domains with complex bathymetries: the Mille \^Iles river and the Montreal archipelago. In both cases, meshes with up to 13 million cells are used. Using 24 to 28 GPUs on these meshes leads to an efficiency of 80% and more. Finally, the multi-GPU version is compared to the pure MPI multi-CPU version, and it is concluded that in this particular case, about 100 CPU cores would be needed to achieve the same performance as one GPU.
... Paraview provided a Data Decomposition (D3) filter [79] that generated ghost data by repartitioning the data. Patchett et al. [80] presented an algorithm to generate ghost data; this algorithm was integrated into the Visualization Toolkit (VTK) [19]. Each processor exchanged its external boundaries information with all other processors. ...
Article
Full-text available
Volume data these days is usually massive in terms of its topology, multiple fields, or temporal component. With the gap between compute and memory performance widening, the memory subsystem becomes the primary bottleneck for scientific volume visualization. Simple, structured, regular representations are often infeasible because the buses and interconnects involved need to accommodate the data required for interactive rendering. In this state‐of‐the‐art report, we review works focusing on large‐scale volume rendering beyond those typical structured and regular grid representations. We focus primarily on hierarchical and adaptive mesh refinement representations, unstructured meshes, and compressed representations that gained recent popularity. We review works that approach this kind of data using strategies such as out‐of‐core rendering, massive parallelism, and other strategies to cope with the sheer size of the ever‐increasing volume of data produced by today's supercomputers and acquisition devices. We emphasize the data management side of large‐scale volume rendering systems and also include a review of tools that support the various volume data types discussed.
Article
This paper shows the development of a multi-GPU version of a time-explicit finite volume solver for the Shallow-Water Equations (SWE) on a multi-GPU architecture. MPI is combined with CUDA-Fortran in order to use as many GPUs as needed and the METIS library is leveraged to perform a domain decomposition on the 2D unstructured triangular meshes of interest. A CUDA-Aware version of OpenMPI is adopted to speed up the messages between the MPI processes. A study of both speed-up and efficiency is conducted; first, for a classic dam-break flow in a canal, and then for two real domains with complex bathymetries. In both cases, meshes with up to 12 million cells are used. Using 24 to 28 GPUs on these meshes leads to an efficiency of 80% and more. Finally, the multi-GPU version is compared to the pure MPI multi-CPU version, and it is concluded that in this particular case, about 100 CPU cores would be needed to achieve the same performance as one GPU. The developed methodolo