Chapter

Accelerating Extreme-Scale Numerical Weather Prediction

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Numerical Weather Prediction (NWP) and climate simulations have been intimately connected with progress in supercomputing since the first numerical forecast was made about 65 years ago. The biggest challenge to state-of-the-art computational NWP arises today from its own software productivity shortfall. The application software at the heart of most NWP services is ill-equipped to efficiently adapt to the rapidly evolving heterogeneous hardware provided by the supercomput-ing industry. If this challenge is not addressed it will have dramatic negative consequences for weather and climate prediction and associated services. This article introduces Atlas, a flexible data structure framework developed at the European Centre for Medium-Range Weather Forecasts (ECMWF) to facilitate a variety of numerical discretisation schemes on heterogeneous architectures, as a necessary step towards affordable ex-ascale high-performance simulations of weather and climate. A newly developed hybrid MPI-OpenMP finite volume module built upon Atlas serves as a first demonstration of the parallel performance that can be achieved using Atlas' initial capabilities.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... For reduced grids as the ones shown in Figure 4b and Figure 4d or for uniformly distributed unstructured grids, an "equal regions" domain decomposition is more advantageous [11]- [13]. The "equal regions" partitioning algorithm divides a twodimensional grid of the sphere (i.e. ...
Preprint
Full-text available
This document is one of the deliverable reports created for the ESCAPE project. ESCAPE stands for Energy-efficient Scalable Algorithms for Weather Prediction at Exascale. The project develops world-class, extreme-scale computing capabilities for European operational numerical weather prediction and future climate models. This is done by identifying Weather & Climate dwarfs which are key patterns in terms of computation and communication (in the spirit of the Berkeley dwarfs). These dwarfs are then optimised for different hardware architectures (single and multi-node) and alternative algorithms are explored. Performance portability is addressed through the use of domain specific languages. In this deliverable report, we present Atlas, a new software library that is currently being developed at the European Centre for Medium-Range Weather Forecasts (ECMWF), with the scope of handling data structures required for NWP applications in a flexible and massively parallel way. Atlas provides a versatile framework for the future development of efficient NWP and climate applications on emerging HPC architectures. The applications range from full Earth system models, to specific tools required for post-processing weather forecast products. Atlas provides data structures for building various numerical strategies to solve equations on the sphere or limited area's on the sphere. These data structures may contain a distribution of points (grid) and, possibly, a composition of elements (mesh), required to implement the numerical operations required. Atlas can also represent a given field within a specific spatial projection. Atlas is capable of mapping fields between different grids as part of pre- and post-processing stages or as part of coupling processes whose respective fields are discretised on different grids or meshes.
Article
The assessment of dissolved oxygen (DO) concentration at the sea surface is essential for comprehending the global ocean oxygen cycle and associated environmental and biochemical processes as it serves as the primary site for photosynthesis and sea-air exchange. However, limited comprehensive measurements and imprecise numerical simulations have impeded the study of global sea surface DO and its relationship with environmental challenges. This paper presents a novel spatiotemporal information embedding machine-learning framework that provides explanatory insights into the underlying driving mechanisms. By integrating extensive in situ data and high-resolution satellite data, the proposed framework successfully generated high-resolution (0.25° × 0.25°) estimates of DO concentration with exceptional accuracy (R2 = 0.95, RMSE = 11.95 μmol/kg, and test number = 2805) for near-global sea surface areas from 2010 to 2018, uncertainty estimated to be ±13.02 μmol/kg. The resulting sea surface DO data set exhibits precise spatial distribution and reveals compelling correlations with prominent marine phenomena and environmental stressors. Leveraging its interpretability, our model further revealed the key influence of marine factors on surface DO and their implications for environmental issues. The presented machine-learning framework offers an improved DO data set with higher resolution, facilitating the exploration of oceanic DO variability, deoxygenation phenomena, and their potential consequences for environments.
Chapter
To determine the best method for solving a numerical problem modeled by a partial differential equation, one should consider the discretization of the problem, the computational hardware used and the implementation of the software solution. In solving a scientific computing problem, the level of accuracy can also be important, with some numerical methods being efficient for low accuracy simulations, but others more efficient for high accuracy simulations. Very few high performance benchmarking efforts allow the computational scientist to easily measure such tradeoffs in order to obtain an accurate enough numerical solution at a low computational cost. These tradeoffs are examined in the numerical solution of the one dimensional Klein Gordon equation on single cores of an ARM CPU, an AMD x86-64 CPU, two Intel x86-64 CPUs and a NEC SX-ACE vector processor. The work focuses on comparing the speed and accuracy of several high order finite difference spatial discretizations using a conjugate gradient linear solver and a fast Fourier transform based spatial discretization. In addition implementations using second and fourth order timestepping are also included in the comparison. The work uses accuracy-efficiency frontiers to compare the effectiveness of five hardware platforms
Conference Paper
AIAA 2018-3497 Paper The paper examines recent advancements in the class of Nonoscillatory Forward-in-Time (NFT) schemes that exploit the implicit LES (ILES) properties of Multidimensional Positive Definite Advection Transport Algorithm (MPDATA). The reported developments address both global and limited area models spanning a range of atmospheric flows, from the hydrostatic regime at planetary scale, down to mesoscale and microscale where flows are inherently non- hydrostatic. All models operate on fully unstructured (and hybrid) meshes and utilize a median dual mesh finite volume discretisation. High performance computations for global flows employ a bespoke hybrid MPI-OpenMP approach and utilise the ATLAS library. Simulations across scales—from a global baroclinic instability epitomising evolution of weather systems down to stratified orographic flows rich in turbulent phenomena due to gravity-wave breaking in dispersive media, verify the computational advancements and demonstrate the efficacy of ILES both in regularizing large scale flows at the scale of the mesh resolution and taking a role of a subgrid-scale turbulence model in simulation of turbulent flows in the LES regime.
Article
Full-text available
Weather and climate models are complex pieces of software which include many individual components, each of which is evolving under pressure to exploit advances in computing to enhance some combination of a range of possible improvements (higher spatio-temporal resolution, increased fidelity in terms of resolved processes, more quantification of uncertainty, etc.). However, after many years of a relatively stable computing environment with little choice in processing architecture or programming paradigm (basically X86 processors using MPI for parallelism), the existing menu of processor choices includes significant diversity, and more is on the horizon. This computational diversity, coupled with ever increasing software complexity, leads to the very real possibility that weather and climate modelling will arrive at a chasm which will separate scientific aspiration from our ability to develop and/or rapidly adapt codes to the available hardware. In this paper we review the hardware and software trends which are leading us towards this chasm, before describing current progress in addressing some of the tools which we may be able to use to bridge the chasm. This brief introduction to current tools and plans is followed by a discussion outlining the scientific requirements for quality model codes which have satisfactory performance and portability, while simultaneously supporting productive scientific evolution. We assert that the existing method of incremental model improvements employing small steps which adjust to the changing hardware environment is likely to be inadequate for crossing the chasm between aspiration and hardware at a satisfactory pace, in part because institutions cannot have all the relevant expertise in house. Instead, we outline a methodology based on large community efforts in engineering and standardisation, which will depend on identifying a taxonomy of key activities – perhaps based on existing efforts to develop domain-specific languages, identify common patterns in weather and climate codes, and develop community approaches to commonly needed tools and libraries – and then collaboratively building up those key components. Such a collaborative approach will depend on institutions, projects, and individuals adopting new interdependencies and ways of working.
Article
Full-text available
Weather and climate models are complex pieces of software which include many individual components, each of which is evolving under the pressure to exploit advances in computing to enhance some combination of a range of possible improvements (higher spatio/temporal resolution, increased fidelity in terms of resolved processes, more quantification of uncertainty etc). However, after many years of a relatively stable computing environment with little choice in processing architecture or programming paradigm (basically X86 processors using MPI for parallelism), the existing menu of processor choices includes significant diversity, and more is on the horizon. This computational diversity, coupled with ever increasing software complexity, leads to the very real possibility that weather and climate modelling will arrive at a chasm which will separate scientific aspiration from our ability to develop and/or rapidly adapt codes to the available hardware. In this paper we review the hardware and software trends which are leading us towards this chasm, before describing current progress in addressing some of the tools which we may be able to use to bridge the chasm. This brief introduction to current tools and plans is followed by a discussion outlining the scientific requirements for quality model codes which have satisfactory performance and portability, while simultaneously supporting productive scientific evolution. We assert that the existing method of incremental model improvements employing small steps which adjust to the changing hardware environment is likely to be inadequate for crossing the chasm between aspiration and hardware at a satisfactory pace, in part because institutions cannot have all the relevant expertise in house. Instead, we outline a methodology based on large community efforts in engineering and standardisation, one which will depend on identifying a taxonomy of key activities – perhaps based on existing efforts to develop domain specific languages, identify common patterns in weather and climate codes, and develop community approaches to commonly needed tools, libraries etc – and then collaboratively building up those key components. Such a collaborative approach will depend on institutions, projects and individuals adopting new interdependencies and ways of working.
Article
Full-text available
Members in ensemble forecasts differ due to the representations of initial uncertainties and model uncertainties. The inclusion of stochastic schemes to represent model uncertainties has improved the probabilistic skill of the ECMWF ensemble by increasing reliability and reducing the error of the ensemble mean. Recent progress, challenges and future directions regarding stochastic representations of model uncertainties at ECMWF are described in this paper. The coming years are likely to see a further increase in the use of ensemble methods in forecasts and assimilation. This will put increasing demands on the methods used to perturb the forecast model. An area that is receiving a greater attention than 5 to 10 years ago is the physical consistency of the perturbations. Other areas where future efforts will be directed are the expansion of uncertainty representations to the dynamical core and to other components of the Earth system as well as the overall computational efficiency of representing model uncertainty.
Article
Full-text available
The steady path of doubling the global horizontal resolution approximately every 8 years in numerical weather prediction (NWP) at the European Centre for Medium Range Weather Forecasts may be substan- tially altered with emerging novel computing architectures. It coincides with the need to appropriately address and determine forecast uncertainty with increasing resolution, in particular, when convective-scale motions start to be resolved. Blunt increases in the model resolution will quickly become unaffordable and may not lead to improved NWP forecasts. Consequently, there is a need to accordingly adjust proven numerical techniques. An informed decision on the modelling strategy for harnessing exascale, massively parallel computing power thus also requires a deeper understanding of the sensitivity to uncertainty-for each part of the model-and ultimately a deeper understanding of multi-scale interactions in the atmosphere and their numerical realization in ultra-high-resolution NWP and climate simulations. This paper explores opportunities for substantial increases in the forecast efficiency by judicious adjustment of the formal accuracy or relative resolution in the spectral and physical space. One path is to reduce the formal accuracy by which the spectral transforms are computed. The other pathway explores the importance of the ratio used for the horizontal resolution in gridpoint space versus wavenumbers in spectral space. This is relevant for both high-resolution simulations as well as ensemble-based uncertainty estimation.
Technical Report
Full-text available
The recent switch to parallel microprocessors is a milestone in the history of computing. Industry has laid out a roadmap for multicore designs that preserves the programming paradigm of the past via binary compatibility and cache coherence. Conventional wisdom is now to double the number of cores on a chip with each silicon generation. A multidisciplinary group of Berkeley researchers met nearly two years to discuss this change. Our view is that this evolutionary approach to parallel hardware and software may work from 2 or 8 processor systems, but is likely to face diminishing returns as 16 and 32 processor systems are realized, just as returns fell with greater instruction-level parallelism. We believe that much can be learned by examining the success of parallelism at the extremes of the computing spectrum, namely embedded computing and high performance computing. This led us to frame the parallel landscape with seven questions, and to recommend the following: The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS per development dollar. Instead of traditional benchmarks, use 13 "Dwarfs" to design and evaluate parallel programming models and architectures. (A dwarf is an algorithmic method that captures a pattern of computation and communication.) "Autotuners" should play a larger role than conventional compilers in translating parallel programs. To maximize programmer productivity, future programming models must be more human-centric than the conventional focus on hardware or applications. To be successful, programming models should be independent of the number of processors. To maximize application efficiency, programming models should support a wide range of data types and successful models of parallelism: task-level parallelism, word-level parallelism, and bit-level parallelism. Architects should not include features that significantly affect performance or energy if programmers cannot accurately measure their impact via performance counters and energy counters. Traditional operating systems will be deconstructed and operating system functionality will be orchestrated using libraries and virtual machines. To explore the design space rapidly, use system emulators based on Field Programmable Gate Arrays (FPGAs) that are highly scalable and low cost. Since real world applications are naturally parallel and hardware is naturally parallel, what we need is a programming model, system software, and a supporting architecture that are naturally parallel. Researchers have the rare opportunity to re-invent these cornerstones of computing, provided they simplify the efficient programming of highly parallel systems.
Article
Full-text available
The recursive zonal equal area (EQ) sphere partitioning algorithm is a practical algorithm for partitioning higher dimensional spheres into regions of equal area and small diameter. This paper describes the partition algorithm and its implementation in Matlab, provides numerical results and gives a sketch of the proof of the bounds on the diameter of regions. A companion paper [13] gives details of the proof.
Article
The paper documents the development of a global nonhydrostatic finite-volume module designed to enhance an established spectral-transform based numerical weather prediction (NWP) model. The module adheres to NWP standards, with formulation of the governing equations based on the classical meteorological latitude-longitude spherical framework. In the horizontal, a bespoke unstructured mesh with finite-volumes built about the reduced Gaussian grid of the existing NWP model circumvents the notorious stiffness in the polar regions of the spherical framework. All dependent variables are co-located, accommodating both spectral-transform and grid-point solutions at the same physical locations. In the vertical, a uniform finite-difference discretisation facilitates the solution of intricate elliptic problems in thin spherical shells, while the pliancy of the physical vertical coordinate is delegated to generalised continuous transformations between computational and physical space. The newly developed module assumes the compressible Euler equations as default, but includes reduced soundproof PDEs as an option. Furthermore, it employs semi-implicit forward-in-time integrators of the governing PDE systems, akin to but more general than those used in the NWP model. The module shares the equal regions parallelisation scheme with the NWP model, with multiple layers of parallelism hybridising MPI tasks and OpenMP threads. The efficacy of the developed nonhydrostatic module is illustrated with benchmarks of idealised global weather.
Conference Paper
Since the mid-90s IFS has used a 2-dimensional scheme for partitioning grid point space to MPI tasks. While this scheme has served ECMWF well there has nevertheless been some areas of concern, namely, communication overheads for IFS reduced grids at the poles to support the Semi-Lagrangian scheme; and the halo requirements needed to support the interpolation of fields between model and radiation grids. These issues have been addressed by the implementation of a new partitioning scheme called EQ_REGIONS which is characterised by an increasing number of partitions in bands from the poles to the equator. The number of bands and the number of partitions in each particular band are derived so as to provide partitions of equal area and small 'diameter'. The EQ_REGIONS algorithm used in IFS is based on the work of Paul Leopardi, School of Mathematics, University of New South Wales, Sydney, Australia.
Article
Very high resolution spectral transform models are believed to become prohibitively expensive, due to the relative increase in computational cost of the Legendre transforms compared to the gridpoint computations. This article describes the implementation of a practical fast spherical harmonics transform into the Integrated Forecasting System (IFS) at ECMWF. Details of the accuracy of the computations, of the parallelisation and memory use are discussed. Results are presented that demonstrate the cost-effectiveness and accuracy of the fast spherical harmonics transform, successfully mitigating the concern about the disproportionally growing computational cost. Using the new transforms, the first T7999 global weather forecast (equivalent to approximately 2.5km horizontal grid size) using a spectral transform model has been produced.
Article
Integrations of spectral models are presented in which the "Gaussian' grid of points at which the nonlinear terms are evaluated is reduced as the poles are approached. A maximum saving in excess of one-third the number of points covering the globe is obtained by requiring that the grid length in the zonal direction does not exceed the grid length at the equator, and that the number of points around a latitude circle enables the use of a fast Fourier transform. The results show that such a reduced grid can be used for short- and medium-range prediction (and presumably also for climate studies) with no significant loss of accuracy compared with use of a conventional grid, which is uniform in longitude. The saving in computational time is between 20% and 25% for the T106 forecast model. -from Authors
Article
A deterministic initial‐value test case for dry dynamical cores of atmospheric general‐circulation models is presented that assesses the evolution of an idealized baroclinic wave in the northern hemisphere. The initial zonal state is quasi‐realistic and completely defined by analytic expressions which are a steady‐state solution of the adiabatic inviscid primitive equations with pressure‐based vertical coordinates. A two‐component test strategy first evaluates the ability of the discrete approximations to maintain the steady‐state solution. Then an overlaid perturbation is introduced which triggers the growth of a baroclinic disturbance over the course of several days. The test is applied to four very different dynamical cores at varying horizontal and vertical resolutions. In particular, the NASA/NCAR Finite Volume dynamics package, the National Center for Atmospheric Research spectral transform Eulerian and the semi‐Lagrangian dynamical cores of the Community Atmosphere Model CAM3 are evaluated. In addition, the icosahedral finite‐difference model GME of the German Weather Service is tested. These hydrostatic dynamical cores represent a broad range of numerical approaches and, at very high resolutions, provide independent reference solutions. The paper discusses the convergence‐with‐resolution characteristics of the schemes and evaluates the uncertainty of the high‐resolution reference solutions. Copyright © 2006 Royal Meteorological Society
Article
An arbitrary finite-volume approach is developed for discretising partial differential equations governing fluid flows on the sphere. Unconventionally for unstructured-mesh global models, the governing equations are cast in the anholonomic geospherical framework established in computational meteorology. The resulting discretisation retains proven properties of the geospherical formulation, while it offers the flexibility of unstructured meshes in enabling irregular spatial resolution. The latter allows for a global enhancement of the spatial resolution away from the polar regions as well as for a local mesh refinement. A class of non-oscillatory forward-in-time edge-based solvers is developed and applied to numerical examples of three-dimensional hydrostatic flows, including shallow-water benchmarks, on a rotating sphere.
A new partitioning approach for ECMWF’s integrated forecasting system (IFS) In: Proceedings of the Twelfth ECMWF Workshop: Use of High Performance Computing in Meteorology
  • G Mozdzynski
Mozdzynski, G.: A new partitioning approach for ECMWF's integrated forecasting system (IFS). p 148-166 in Proceedings of the Twelfth ECMWF Workshop: Use of High Performance Computing in Meteorology, 30 October -3 November, 2006, Reading, UK, World Scientific 273 pp. (2007)
A hybrid all-scale finite-volume module for stratified flows on a rotating sphere
  • P K Smolarkiewicz
  • W Deconinck
  • M Hamrud
  • C Kühnlein
  • G Mozdzynski
  • J Szmelter
  • N P Wedi
Smolarkiewicz, P.K., Deconinck, W., Hamrud, M., Kühnlein, C., Mozdzynski, G., Szmelter, J., Wedi, N.P.: A hybrid all-scale finite-volume module for stratified flows on a rotating sphere, J. Comput. Phys. (2016) submitted