Ghost cells are indicated by dashed lines. With one layer of ghost cells, isocontouring produces correct geometry (gray triangle) in the local partition. The geometry in the ghost cells, though, is incorrect (red triangles) because the values at points P7 and P8 are faulty. The normal calculated for vertex V3 will be inaccurate, since the triangles in C6, C7 and C8 are all incorrect. This will result in lighting artifacts when the isocontour is rendered.

Source publication

Parallel Multi-Level Ghost Cell Generation for Distributed Unstructured Grids

Conference Paper

Full-text available

Oct 2017

Ghost cells are important for distributed memory parallel operations that require neighborhood information, and are required for correctness on the boundaries of local data partitions. Ghost cells are one or more layers of grid cells surrounding the external boundary of the local partition, which are owned by other data partitions. They are used by...

Context 1

... single layer of ghost cells supplies enough information for a cell-to-point algorithm over a distributed data set to calculate the correct values for points on the parallel partition boundary. Figure 5 shows the result of using one layer of ghost cells. A topologically correct isocontour is produced with this single layer of ghost cells. ...

View in full-text

Context 2

... the same way an isocontour produces incorrect triangles on the boundary without a first layer of ghost cells, the first layer of ghost cells will produce incorrect triangles without another layer of ghost cells. This problem can be seen in Figure 5. ...

View in full-text

Parallel high-order resolution of the Shallow-water equations on real large-scale meshes with complex bathymetries

Preprint

Mar 2023

The resolution of the Shallow-water equations is of practical interest in the study of inundations and often requires very large and dense meshes to accurately simulate river flows. Those large meshes are often decomposed into multiple sub-domains to allow for parallel processing. When such a decomposition process is used in the context of distributed parallel computing, each sub-domain requires an exchange of one or more layers of ghost cells at each time step of the simulation due to the spatial dependency of numerical methods. In the first part of this paper, we show how the domain decomposition and ghost-layer generation process can be performed in a parallel manner for large meshes, and show a new way of storing the resulting sub-domains with all their send/receive information within a single CGNS mesh file. The performance of the ghost-layer generation process is studied both in terms of time and memory on 2D and 3D meshes containing up to 70 million cells. In the second part of the paper, the program developed in the first part is used to generate the domain decomposition of large meshes of practical interest in the study of natural free surface flows. We use our in-house multi-CPU multi-GPU (MPI+CUDA) solver to show the impact of multiple layers of ghost cells on the execution times of the first-order HLLC, and second-order WAF and MUSCL methods. Finally, the parallel solver is used to perform the second-order resolution on real large-scale meshes of rivers near Montr\'eal, using up to 32 GPUs on meshes of 13 million cells.

Parallel high-order resolution of the Shallow-water equations on real large-scale meshes with complex bathymetries

Article

Sep 2022
J COMPUT PHYS

The resolution of the Shallow-water equations is of practical interest in the study of inundations and often requires very large and dense meshes to accurately simulate river flows. Those large meshes are often decomposed into multiple sub-domains to allow for parallel processing. When such a decomposition process is used in the context of distributed parallel computing, each sub-domain requires an exchange of one or more layers of ghost cells at each time step of the simulation due to the spatial dependency of numerical methods. In the first part of this paper, we show how the domain decomposition and ghost-layer generation process can be performed in a parallel manner for large meshes, and show a new way of storing the resulting sub-domains with all their send/receive information within a single CGNS mesh file. The performance of the ghost-layer generation process is studied both in terms of time and memory on 2D and 3D meshes containing up to 70 million cells. In the second part of the paper, the program developed in the first part is used to generate the domain decomposition of large meshes of practical interest in the study of natural free surface flows. We use our in-house multi-CPU multi-GPU (MPI+CUDA) solver to show the impact of multiple layers of ghost cells on the execution times of the first-order HLLC, and second-order WAF and MUSCL methods. Finally, the parallel solver is used to perform the second-order resolution on real large-scale meshes of rivers near Montréal, using up to 32 GPUs on meshes of 13 million cells.

Unordered Task-Parallel Augmented Merge Tree Construction

Article

Full-text available

Apr 2021
IEEE T VIS COMPUT GR

Contemporary scientific data sets require fast and scalable topological analysis to enable visualization, simplification and interaction. Within this field, parallel merge tree construction has seen abundant recent contributions, with a trend of decentralized, task-parallel or SMP-oriented algorithms dominating in terms of total runtime. However, none of these recent approaches computed complete merge trees on distributed systems, leaving this field to traditional divide and conquer approaches. This paper introduces a scalable, parallel and distributed algorithm for merge tree construction outperforming the previously fastest distributed solution by a factor of around three. This is achieved by a task-parallel identification of individual merge tree arcs by growing regions around critical points in the data, without any need for ordered progression or global data structures, based on a novel insight introducing a sufficient local boundary for region growth.

Multi-GPU implementation of a time-explicit finite volume solver for the Shallow-Water Equations using CUDA and a CUDA-Aware version of OpenMPI

Preprint

Oct 2020

This paper shows the development of a multi-GPU version of a time-explicit finite volume solver for the Shallow-Water Equations (SWE) on a multi-GPU architecture. MPI is combined with CUDA-Fortran in order to use as many GPUs as needed. The METIS library is leveraged to perform a domain decomposition on the 2D unstructured triangular meshes of interest. A CUDA-Aware OpenMPI version is adopted to speed up the messages between the MPI processes. A study of both speed-up and efficiency is conducted; first, for a classic dam-break flow in a canal, and then for two real domains with complex bathymetries: the Mille \^Iles river and the Montreal archipelago. In both cases, meshes with up to 13 million cells are used. Using 24 to 28 GPUs on these meshes leads to an efficiency of 80% and more. Finally, the multi-GPU version is compared to the pure MPI multi-CPU version, and it is concluded that in this particular case, about 100 CPU cores would be needed to achieve the same performance as one GPU.

Evaluating Parallel Particle Advection Algorithms Over Various Workloads

Thesis

Mar 2020

Roba Binyahib

State‐of‐the‐art in Large‐Scale Volume Visualization Beyond Structured Data

Article

Full-text available

Jun 2023
COMPUT GRAPH FORUM

Volume data these days is usually massive in terms of its topology, multiple fields, or temporal component. With the gap between compute and memory performance widening, the memory subsystem becomes the primary bottleneck for scientific volume visualization. Simple, structured, regular representations are often infeasible because the buses and interconnects involved need to accommodate the data required for interactive rendering. In this state‐of‐the‐art report, we review works focusing on large‐scale volume rendering beyond those typical structured and regular grid representations. We focus primarily on hierarchical and adaptive mesh refinement representations, unstructured meshes, and compressed representations that gained recent popularity. We review works that approach this kind of data using strategies such as out‐of‐core rendering, massive parallelism, and other strategies to cope with the sheer size of the ever‐increasing volume of data produced by today's supercomputers and acquisition devices. We emphasize the data management side of large‐scale volume rendering systems and also include a review of tools that support the various volume data types discussed.

Distributed Computation of Persistent Homology from Partitioned Big Data

Conference Paper

Sep 2021

Multi-GPU implementation of a time-explicit finite volume solver using CUDA and a CUDA-Aware version of OpenMPI with application to shallow water flows

Article

Oct 2021
COMPUT PHYS COMMUN

This paper shows the development of a multi-GPU version of a time-explicit finite volume solver for the Shallow-Water Equations (SWE) on a multi-GPU architecture. MPI is combined with CUDA-Fortran in order to use as many GPUs as needed and the METIS library is leveraged to perform a domain decomposition on the 2D unstructured triangular meshes of interest. A CUDA-Aware version of OpenMPI is adopted to speed up the messages between the MPI processes. A study of both speed-up and efficiency is conducted; first, for a classic dam-break flow in a canal, and then for two real domains with complex bathymetries. In both cases, meshes with up to 12 million cells are used. Using 24 to 28 GPUs on these meshes leads to an efficiency of 80% and more. Finally, the multi-GPU version is compared to the pure MPI multi-CPU version, and it is concluded that in this particular case, about 100 CPU cores would be needed to achieve the same performance as one GPU. The developed methodolo

Distributed Percolation Analysis for Turbulent Flows

Conference Paper

Oct 2019

Contexts in source publication

Citations