(a) Pressure contours around full SSLV configuration including orbiter, external tank, solid rocket boosters, and fore and aft attach hardware for benchmarking case described in text. (b) Parallel scalability of Cart3D solver module on Columbia using SSLV example on 25 million cell mesh. Runs conducted on single 512 CPU node of Columbia system.

Source publication

High Resolution Aerospace Applications Using the NASA Columbia Supercomputer.

Article

Full-text available

Jan 2007

This paper focuses on the parallel performance of two high-performance aerodynamic simulation packages on the newly installed NASA Columbia supercomputer. These packages include both a high-fidelity, unstructured, Rey-nolds-averaged Navier–Stokes solver, and a fully-automated inviscid flow package for cut-cell Cartesian grids. The complementary com...

High Resolution Aerospace Applications Using the NASA Columbia Supercomputer

Conference Paper

Full-text available

Jan 2005

This paper focuses on the parallel performance of two high-performance aerodynamic simulation packages on the newly installed NASA Columbia supercomputer. These packages include both a high-fidelity, unstructured, Reynolds-averaged Navier-Stokes solver, and a fully-automated inviscid flow package for cut-cell Cartesian grids. The complementary comb...

High Resolution Aerospace Applications Using the NASA

Article

Full-text available

Jan 2005

This paper focuses on the parallel performance of two high-performance aerodynamic simulation packages on the newly installed NASA Columbia supercomputer. These packages include both a high-fidelity, unstructured, Reynolds-averaged Navier–Stokes solver, and a fully-automated inviscid flow package for cut-cell Cartesian grids. The complementary comb...

Simulation-based height of burst map for asteroid airburst damage prediction

Article

Full-text available

Dec 2017
ACTA ASTRONAUT

Entry and breakup models predict that airburst in the Earth's atmosphere is likely for non-metallic asteroids with diameters up to approximately 200 meters. (Collins, 2005; Collins, 2017; Wheeler, 2017). Objects of this size can deposit over 250 megatons of energy into the atmosphere. Fast-running ground damage prediction codes for such events rely heavily upon methods developed from nuclear weapons research to estimate the damage potential for an airburst at altitude (Collins, 2005; Mathias, 2017; Rumpf, 2017; Rumpf, 2016; Hills, 1993). In particular, these tools rely upon the powerful yield scaling laws developed for point-source blasts that are used in conjunction with a Height of Burst (HOB) map to predict ground damage for an airburst of a specific energy at a given altitude. While this approach works extremely well for yields as large as tens of megatons, it becomes less accurate as asteroid size and effective yields increase to the hundreds of megatons potentially released in larger airburst events. Accordingly, this study revisits the assumptions underlying this approach and shows how atmospheric buoyancy becomes important as yield increases beyond a few megatons. We then use large-scale three-dimensional simulations to construct numerically generated Height of Burst maps that are appropriate at the higher energy levels associated with the entry of asteroids with diameters of hundreds of meters. These numerically generated HOB maps can then be incorporated into engineering methods for damage prediction, significantly improving their accuracy for asteroids with diameters greater than 80-100 m.

An Application-Based Performance Evaluation of NASA's Nebula Cloud Computing Platform

Conference Paper

Full-text available

Jun 2012

The high performance computing (HPC) community has shown tremendous interest in exploring cloud computing as it promises high potential. In this paper, we examine the feasibility, performance, and scalability of production quality scientific and engineering applications of interest to NASA on NASA's cloud computing platform, called Nebula, hosted at Ames Research Center. This work represents the comprehensive evaluation of Nebula using NUTTCP, HPCC, NPB, I/O, and MPI function benchmarks as well as four applications representative of the NASA HPC workload. Specifically, we compare Nebula performance on some of these benchmarks and applications to that of NASA's Pleiades supercomputer, a traditional HPC system. We also investigate the impact of virtIO and jumbo frames on interconnect performance. Overall results indicate that on Nebula (i) virtIO and jumbo frames improve network bandwidth by a factor of 5x, (ii) there is a significant virtualization layer overhead of about 10% to 25%, (iii) write performance is lower by a factor of 25x, (iv) latency for short MPI messages is very high, and (v) overall performance is 15% to 48% lower than that on Pleiades for NASA HPC applications. We also comment on the usability of the cloud platform.

The impact of hyper-threading on processor resource utilization in production applications

Conference Paper

Full-text available

Dec 2011

Intel provides Hyper-Threading (HT) in processors based on its Pentium and Nehalem micro-architecture such as the Westmere-EP. HT enables two threads to execute on each core in order to hide latencies related to data access. These two threads can execute simultaneously, filling unused stages in the functional unit pipelines. To aid better understanding of HT-related issues, we collect Performance Monitoring Unit (PMU) data (instructions retired; unhalted core cycles; L2 and L3 cache hits and misses; vector and scalar floating-point operations, etc.). We then use the PMU data to calculate a new metric of efficiency in order to quantify processor resource utilization and make comparisons of that utilization between single-threading (ST) and HT modes. We also study performance gain using unhalted core cycles, code efficiency of using vector units of the processor, and the impact of HT mode on various shared resources like L2 and L3 cache. Results using four full-scale, production-quality scientific applications from computational fluid dynamics (CFD) used by NASA scientists indicate that HT generally improves processor resource utilization efficiency, but does not necessarily translate into overall application performance gain.

NSU3D Results for the First AIAA High Lift Prediction Workshop

Article

Jan 2011

Simulation results for the first AIAA CFD High Lift Prediction Workshop using the unstructured computational fluid dynamics code NSU3D are presented. The solution algo-rithms employed in NSU3D for this study are described along with examples of convergence history and computational cost. The geometry used for the simulation is the NASA three-element swept wing (Trap Wing) model with experimental data taken in the 14x22 foot wind tunnel at NASA Langley. Computational grids for the study were prepared by the authors with the VGRIDns package using CAD geometry provided by the workshop com-mittee. A grid convergence study was performed using a family of three grids to assess sensitivity to grid resolution. A range of angle-of-attack values from 6 • to 37 • was completed on the medium grid and the resulting lift, drag and longitudinal moments are compared to the experimental results. The effect of changing the flap angle is investigated with a second model using an equivalent medium grid. Comparisons of surface pressure to experimental data are presented for both configurations. Flow features are also presented using surface constrained and volume streamlines for selected cases.

Performance impact of resource contention in multicore systems

Conference Paper

Full-text available

Apr 2010

Resource sharing in commodity multicore processors can have a significant impact on the performance of production applications. In this paper we use a differential performance analysis methodology to quantify the costs of contention for resources in the memory hierarchy of several multicore processors used in high-end computers. In particular, by comparing runs that bind MPI processes to cores in different patterns, we can isolate the effects of resource sharing. We use this methodology to measure how such sharing affects the performance of four applications of interest to NASA-OVERFLOW, MITgcm, Cart3D, and NCC. We also use a subset of the HPCC benchmarks and hardware counter data to help interpret and validate our findings. We conduct our study on high-end computing platforms that use four different quad-core microprocessors-Intel Clovertown, Intel Harpertown, AMD Barcelona, and Intel Nehalem-EP. The results help further our understanding of the requirements these codes place on their production environments and also of each computer's ability to deliver performance.

Application of the HELIOS computational platform to rotorcraft flow fields

Article

Full-text available

Jan 2010

This article describes the architecture, components, capabilities, and validation of the first version of the Helios platform, targeted towards rotorcraft aerodynamics. Capabilities delivered in the first version include fuselage aerodynamics with and without momentum-disk rotor models, and isolated rotor dynamics for ideal hover and forward flight coupled with aeroelasticity and trim. Helios is based on an overset framework that employs unstruc-tured mixed-element meshes in the near-body domain combined with high-order Cartesian meshes in the off-body domain. In addition, the aerodynamics solution is coupled with structural dynamics and trim using a delta-coupling algorithm. The near-body CFD, off-body CFD, CSD and trim modules are coupled using a Python infrastructure that controls the execution sequence of the solution procedure. Specific validation studies presented include the Slowed Rotor Compound fuselage, Georgia Tech rotor body, TRAM rotor in hover and UH-60A rotor in forward flight. In all cases, Helios predictions are compared with experimental data and other state-of-the-art codes to demonstrate the accuracy, effi-ciency and scalability of the code.

A Parallel Geometric Multigrid Method for Finite Elements on Octree Meshes

Article

Jan 2010
SIAM J SCI COMPUT

In this article, we present a parallel geometric multigrid algorithm for solving variable-coefficient elliptic partial differential equations on the unit box (with Dirichlet or Neumann boundary conditions) using highly nonuniform, octree-based, conforming finite element discretizations. Our octrees are 2:1 balanced, that is, we allow no more than one octree-level difference between octants that share a face, edge, or vertex. We describe a parallel algorithm whose input is an arbitrary 2:1 balanced fine-grid octree and whose output is a set of coarser 2:1 balanced octrees that are used in the multigrid scheme. Also, we derive matrix-free schemes for the discretized finite element operators and the intergrid transfer operations. The overall scheme is second-order accurate for sufficiently smooth right-hand sides and material properties; its complexity for nearly uniform trees is O( N/np log N/zp )+O(n p log np), where N is the number of octree nodes and np is the number of processors. Our implementation uses the Message Passing Interface standard. We present numerical experiments for the Laplace and Navier (linear elasticity) operators that demonstrate the scalability of our method. Our largest run was a highly nonuniform, 8-billion-unknown, elasticity calculation using 32,000 processors on the Teragrid system, "Ranger," at the Texas Advanced Computing Center. Our implementation is publically available in the Dendro library, which is built on top of the PETSc library from Argonne National Laboratory.

Early Performance Evaluation of a “Nehalem” Cluster Using Scientific and Engineering Applications

Article

Full-text available

Nov 2009

In this paper, we present an early performance evaluation of a 624-core cluster based on the Intel® Xeon® Processor 5560 (code named "Nehalem-EP", and referred to as Xeon 5560 in this paper)—the third-generation quad-core architecture from Intel. This is the first processor from Intel with a non-uniform memory access (NUMA) architecture managed by on-chip integrated memory controller. It employs a point-to-point interconnect called the Intel® QuickPath Interconnect (QPI) between processors and to the input/output (I/O) hub. It also introduces to a quad-core architecture both Intel's hyper-threading technology (or simultaneous multi-threading, "SMT") and Intel® Turbo Boost Technology ("Turbo mode") that automatically allow processor cores to run faster than the base operating frequency if the processor is operating below rated power, temperature, and current specification limits. It can be engaged with any number of cores or logical processors enabled and active. We critically evaluate these features using the High Performance Computing Challenge (HPCC) benchmarks, NAS Parallel Benchmarks (NPB), and four full-scale scientific applications. We compare and contrast the results of a cluster based on the Xeon 5560 with an SGI® Altix® ICE 8200EX cluster of quad-core Intel® Xeon® 5472 Processor ("Xeon 5472" from here on) and another cluster of Intel® Xeon® 5462 Processor ("Xeon 5462"; the Xeon 5400 Series Processors are previous generation quad-core Intel processors and were code named Harpertown).

A massively parallel fractional step solver for incompressible flow

Article

Sep 2009
J COMPUT PHYS

This paper presents a parallel implementation of fractional solvers for the incompressible Navier–Stokes equations using an algebraic approach. Under this framework, predictor–corrector and incremental projection schemes are seen as sub-classes of the same class, making apparent its differences and similarities. An additional advantage of this approach is to set a common basis for a parallelization strategy, which can be extended to other split techniques or to compressible flows.The predictor–corrector scheme consists in solving the momentum equation and a modified “continuity” equation (namely a simple iteration for the pressure Schur complement) consecutively in order to converge to the monolithic solution, thus avoiding fractional errors. On the other hand, the incremental projection scheme solves only one iteration of the predictor–corrector per time step and adds a correction equation to fulfill the mass conservation. As shown in the paper, these two schemes are very well suited for massively parallel implementation. In fact, when compared with monolithic schemes, simpler solvers and preconditioners can be used to solve the non-symmetric momentum equations (GMRES, Bi-CGSTAB) and to solve the symmetric continuity equation (CG, Deflated CG). This gives good speedup properties of the algorithm. The implementation of the mesh partitioning technique is presented, as well as the parallel performances and speedups for thousands of processors.

Title: Exploring Discretization Error in Simulation-Based Aerodynamic Databases Authors

Conference Paper

Full-text available

May 2009

This work examines the level of discretization error in simulation-based aerodynamic databases and introduces strategies for error control. Simulations are performed using a parallel, multi-level Euler solver on embedded-boundary Cartesian meshes. Discretiza-tion errors in user-selected outputs are estimated using the method of adjoint-weighted residuals and we use adaptive mesh refinement to reduce these errors to specified tol-erances. Using this framework, we examine the behavior of discretization error throughout a token database computed for a NACA 0012 airfoil consisting of 120 cases. We compare the cost and accuracy of two approaches for aerodynamic database generation. In the first approach, mesh adaptation is used to compute all cases in the database to a prescribed level of accuracy. The second approach conducts all simula-tions using the same computational mesh without adaptation. We quantitatively assess the error landscape and computational costs in both databases. This investigation high-lights sensitivities of the database under a variety of conditions. The presence of tran-sonic shocks or the stiffness in the governing equations near the incompressible limit are shown to dramatically increase discretization error requiring additional mesh reso-lution to control. Results show that such pathologies lead to error levels that vary by over factor of 40 when using a fixed mesh throughout the database. Alternatively, con-trolling this sensitivity through mesh adaptation leads to mesh sizes which span two orders of magnitude. We propose strategies to minimize simulation cost in sensitive regions and discuss the role of error-estimation in database quality.

Similar publications

Citations