Technical Report

The ECMWF Scalability Programme: Progress and Plans

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The efficiency of the forecasting system on future high-performance computing and data handling systems is considered one of the key challenges for implementing ECMWF’s ambitious strategy. This was already recognised by ECMWF in 2013, and has led to the foundation of the Scalability Programme. The programme aims to address this challenge as a concerted action between the Centre and its Member States, but also draws in the computational science expertise available throughout Europe. This technical memorandum provides an overview of the status of the programme, highlights achievements from the first five years ranging from observational data pre-processing, data assimilation, forecast model design and output data post-processing, and defines the roadmap for the next five years towards a sustainable system that can operate on the expected range of hardware and software technologies. This point in time is crucial because the programme will have a strong focus on implementation and operational benefit in the next period.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The European Centre for Medium-Range Weather Forecasts (ECMWF) produces 230 TB of data on a typical day and most of the data are stored on magnetic tapes in its archive. This data production is predicted to quadruple within the next decade due to the increased spatial resolution of the forecast model [2][3][4] . Initiatives towards operational predictions with global storm-resolving simulations, such as Destination Earth 5 or DYAMOND 6 , at a grid spacing of a couple of kilometers, will further increase the volume of data. ...
... where μ A , μ B are the respective means, σ 2 A , σ 2 B the respective variances and σ AB the covariance. c 1 = (k 1 L) 2 For rounded floating-point arrays the decimal error is proportional to the square root of the dissimilarity, 1 − SSIM ( Supplementary Fig. 5c). The SSIM in this case is approximately equal to the correlation, as round-to-nearest is biasfree (that is, μ A ≈ μ B ) and the rounding error is typically much smaller than the standard deviation of the data (that is, σ A ≈ σ B ). Here, we use the logarithmic SSIM, SSIM log (A, B) = SSIM(logA, logB), which is the SSIM applied to log-preprocessed data (the logarithm is applied element-wise). ...
Article
Full-text available
Hundreds of petabytes are produced annually at weather and climate forecast centers worldwide. Compression is essential to reduce storage and to facilitate data sharing. Current techniques do not distinguish the real from the false information in data, leaving the level of meaningful precision unassessed. Here we define the bitwise real information content from information theory for the Copernicus Atmospheric Monitoring Service (CAMS). Most variables contain fewer than 7 bits of real information per value and are highly compressible due to spatio-temporal correlation. Rounding bits without real information to zero facilitates lossless compression algorithms and encodes the uncertainty within the data itself. All CAMS data are 17× compressed relative to 64-bit floats, while preserving 99% of real information. Combined with four-dimensional compression, factors beyond 60× are achieved. A data compression Turing test is proposed to optimize compressibility while minimizing information loss for the end use of weather and climate forecast data.
... The comprehensive linear physics package of ECMWF was constructed through the former, manual approach after first constructing simplified and "regularized" versions of the nonlinear physics, which are computationally cheaper and give more stable behavior after linearization (Janisková & Lopez, 2013). This technique has been used successfully so far for the radiation, vertical diffusion, unresolved gravity wave drag, convection and clouds and precipitation schemes, but could present limits to the overall scalability of the data assimilation system in the future (Bauer et al., 2020). Due to the strict nature of the linear code, namely that the nonlinear, tangent-linear and adjoint models must be formulated in a mutually consistent way, it is harder to port these model components to novel computational hardware, such as graphics processing units, in their current form. ...
... We did not focus on this here as the cost of the non-orographic gravity wave drag scheme is in any case only around 1% of the total cost of the model evaluation. However, given that the physical parametrizations as a whole are often 25% of the total cost of a model integration (Bauer et al., 2020), the computational cost savings from neural network emulators could prove to be their main asset, and this is no less of a concern for tangent-linear and adjoint model integrations. ...
Article
Full-text available
We assess the ability of neural network emulators of physical parametrization schemes in numerical weather prediction models to aid in the construction of linearized models required by four‐dimensional variational (4D‐Var) data assimilation. Neural networks can be differentiated trivially, and so if a physical parametrization scheme can be accurately emulated by a neural network then its tangent‐linear and adjoint versions can be obtained with minimal effort, compared with the standard paradigms of manual or automatic differentiation of the model code. Here we apply this idea by emulating the non‐orographic gravity wave drag parametrization scheme in an atmospheric model with a neural network, and deriving its tangent‐linear and adjoint models. We demonstrate that these neural network‐derived tangent‐linear and adjoint models not only pass the standard consistency tests but also can be used successfully to do 4D‐Var data assimilation. This technique holds the promise of significantly easing maintenance of tangent‐linear and adjoint codes in weather forecasting centers, if accurate neural network emulators can be constructed.
... Even within Earth system models as a whole, a "digital revolution" has been called for [20], where harnessing efficiency in modern hardware is central. Computers can increasingly be customised as hardware is becoming more heterogeneous, meaning that different components for data movement and processing can be combined [21]. Examples of such heterogeneous hardware include the so-called Graphical Processing Units (GPU), Tensor Processing Units, Field-Programmable Gate Arrays, and Application Specific Integrates Circuits, which largely are highly compatible with ML. ...
... Examples of such heterogeneous hardware include the so-called Graphical Processing Units (GPU), Tensor Processing Units, Field-Programmable Gate Arrays, and Application Specific Integrates Circuits, which largely are highly compatible with ML. To take advantage of this heterogeneous hardware, making current ocean models "portable", a significant effort would be necessary [21]. Current ocean models use the Fortran programming language and are parallelised to run on many processors via interfaces such as MPI and OpenMP. ...
Article
Full-text available
Progress within physical oceanography has been concurrent with the increasing sophistication of tools available for its study. The incorporation of machine learning (ML) techniques offers exciting possibilities for advancing the capacity and speed of established methods and for making substantial and serendipitous discoveries. Beyond vast amounts of complex data ubiquitous in many modern scientific fields, the study of the ocean poses a combination of unique challenges that ML can help address. The observational data available is largely spatially sparse, limited to the surface, and with few time series spanning more than a handful of decades. Important timescales span seconds to millennia, with strong scale interactions and numerical modelling efforts complicated by details such as coastlines. This review covers the current scientific insight offered by applying ML and points to where there is imminent potential. We cover the main three branches of the field: observations, theory, and numerical modelling. Highlighting both challenges and opportunities, we discuss both the historical context and salient ML tools. We focus on the use of ML in situ sampling and satellite observations, and the extent to which ML applications can advance theoretical oceanographic exploration, as well as aid numerical simulations. Applications that are also covered include model error and bias correction and current and potential use within data assimilation. While not without risk, there is great interest in the potential benefits of oceanographic ML applications; this review caters to this interest within the research community.
... Even within Earth system models as a whole, a "digital revolution" has been called for [19], where harnessing efficiency in modern hardware is central. Computers can increasingly be customised as hardware is becoming more heterogeneous, meaning that different components for data movement and processing can be combined [20]. Examples of such heterogeneous hardware include the so-called Graphical Processing Units (GPU), Tensor Processing Units, Field-Programmable Gate Arrays, and Application Specific Integrates Circuits, which largely are highly compatible with ML. ...
... Examples of such heterogeneous hardware include the so-called Graphical Processing Units (GPU), Tensor Processing Units, Field-Programmable Gate Arrays, and Application Specific Integrates Circuits, which largely are highly compatible with ML. To take advantage of this heterogeneous hardware, making current ocean models "portable", a significant effort would be necessary [20]. Current ocean models use the Fortran programming language and are parallelised to run on many processors via interfaces such as MPI and OpenMP. ...
Preprint
Progress within physical oceanography has been concurrent with the increasing sophistication of tools available for its study. The incorporation of machine learning (ML) techniques offers exciting possibilities for advancing the capacity and speed of established methods and also for making substantial and serendipitous discoveries. Beyond vast amounts of complex data ubiquitous in many modern scientific fields, the study of the ocean poses a combination of unique challenges that ML can help address. The observational data available is largely spatially sparse, limited to the surface, and with few time series spanning more than a handful of decades. Important timescales span seconds to millennia, with strong scale interactions and numerical modelling efforts complicated by details such as coastlines. This review covers the current scientific insight offered by applying ML and points to where there is imminent potential. We cover the main three branches of the field: observations, theory, and numerical modelling. Highlighting both challenges and opportunities, we discuss both the historical context and salient ML tools. We focus on the use of ML in situ sampling and satellite observations, and the extent to which ML applications can advance theoretical oceanographic exploration, as well as aid numerical simulations. Applications that are also covered include model error and bias correction and current and potential use within data assimilation. While not without risk, there is great interest in the potential benefits of oceanographic ML applications; this review caters to this interest within the research community.
... Therefore, it becomes one of the system bottlenecks restricting the performance improvement of the spectral model. According to figure 5 of Bauer et al. (2020), the spectral transforms (including data transposition) constitute around 21% of the total computational cost in Integrated Forecast System (IFS) TCo1279L137. The fast spherical harmonic transform is proposed in Tygert (2010), Wedi et al. (2013) and Yin et al. (2018Yin et al. ( , 2019 to alleviate the problem caused by the spherical harmonic transform. ...
... The fast spherical harmonic transform is proposed in Tygert (2010), Wedi et al. (2013) and Yin et al. (2018Yin et al. ( , 2019 to alleviate the problem caused by the spherical harmonic transform. However, the data transpositions in spherical harmonic transform require frequent data communication which further results in poor scalability (Bauer et al., 2020). ...
Article
Full-text available
In this article, we describe an implementation of single‐precision fast spherical harmonic transform (SHT) in the Yin–He global spectral model (YHGSM). The potential of single‐precision arithmetic is explored for accelerating fast SHT. In particular, we assess the impact of using single‐precision fast SHT on the meteorological skill of YHGSM. Compared to double‐precision SHT, single‐precision fast SHT against is accelerated by about 59.83%, whereas the run‐time reduction achieved 25.28% in model integration for TL2047L137. The simulation results indicate that single‐precision fast SHT can improve computational efficiency without a noticeable impact on the forecast skill.
... The HRES, a high-resolution configuration of the Integrated Forecasting System (IFS) utilized by the ECMWF, spends one hour to produce a 10-day weather forecast at a detailed latitude and longitude resolution of 0.1 degrees [65]. Similarly, the IFS needs about an hour and a half on 1530 CrayXC40 computer nodes to complete a 15-day ensemble forecast with a resolution of 18 km [102]. In developed countries, it is common to use supercomputers for detailed weather predictions. ...
Article
Full-text available
Accurate and rapid weather forecasting and climate modeling are universal goals in human development. While Numerical Weather Prediction (NWP) remains the gold standard, it faces challenges like inherent atmospheric uncertainties and computational costs, especially in the post-Moore era. With the advent of deep learning, the field has been revolutionized through data-driven models. This paper reviews the key models and significant developments in data-driven weather forecasting and climate modeling. It provides an overview of these models, covering aspects such as dataset selection, model design, training process, computational acceleration, and prediction effectiveness. Data-driven models trained on reanalysis data can provide effective forecasts with an accuracy (ACC) greater than 0.6 for up to 15 days at a spatial resolution of 0.25°. These models outperform or match the most advanced NWP methods for 90% of variables, reducing forecast generation time from hours to seconds. Data-driven climate models can reliably simulate climate patterns for decades to 100 years, offering a magnitude of computational savings and competitive performance. Despite their advantages, data-driven methods have limitations, including poor interpretability, challenges in evaluating model uncertainty, and conservative predictions in extreme cases. Future research should focus on larger models, integrating more physical constraints, and enhancing evaluation methods.
... They are especially reliable for predicting large-scale weather patterns and phenomena. However, solving complex differential equations requires significant computational resources 7 . Over the years, several semiempirical parameterizations of lightning flashes have been developed for numerical cloud models 8 . ...
Article
Full-text available
Traditional fully-deterministic algorithms, which rely on physical equations and mathematical models, are the backbone of many scientific disciplines for decades. These algorithms are based on well-established principles and laws of physics, enabling a systematic and predictable approach to problem-solving. On the other hand, AI-based strategies emerge as a powerful tool for handling vast amounts of data and extracting patterns and relationships that might be challenging to identify through traditional algorithms. Here, we bridge these two realms by using AI to find an optimal mapping of meteorological features predicted two days ahead by the state-of-the-art numerical weather prediction model by the European Centre for Medium-range Weather Forecasts (ECMWF) into lightning flash occurrence. The prediction capability of the resulting AI-enhanced algorithm turns out to be significantly higher than that of the fully-deterministic algorithm employed in the ECMWF model. A remarkable Recall peak of about 95% within the 0-24 h forecast interval is obtained. This performance surpasses the 85% achieved by the ECMWF model at the same Precision of the AI algorithm.
... The atmospheric state is divided into discrete grids to solve the PDEs. The NWP methods have achieved high prediction accuracy (Schultz et al., 2021), but it is troubled by computational costs (Bauer et al., 2020;Bauer et al., 2015), especially when the amount of observation data continues to grow, and it is difficult to effectively parallelize the NWP methods. In addition, the formulas used by the NWP methods and the approximation assumptions made during the calculation process introduce computational errors, which may increase with iteration or incomplete or inaccurate analysis data. ...
Article
Full-text available
Due to the considerable computational demands of physics-based numerical weather prediction, especially when modeling fine-grained spatio-temporal atmospheric phenomena, deep learning methods offer an advantageous approach by leveraging specialized computing devices to accelerate training and significantly reduce computational costs. Consequently, the application of deep learning methods has presented a novel solution in the field of weather forecasting. In this context, we introduce a groundbreaking deep learning-based weather prediction architecture known as Hierarchical U-Net (HU-Net) with re-parameterization techniques. The HU-Net comprises two essential components: a feature extraction module and a U-Net module with re-parameterization techniques. The feature extraction module consists of two branches. First, the global pattern extraction employs adaptive Fourier neural operators and self-attention, well-known for capturing long-term dependencies in the data. Second, the local pattern extraction utilizes convolution operations as fundamental building blocks, highly proficient in modeling local correlations. Moreover, a feature fusion block dynamically combines dual-scale information. The U-Net module adopts RepBlock with re-parameterization techniques as the fundamental building block, enabling efficient and rapid inference. In extensive experiments carried out on the large-scale weather benchmark dataset WeatherBench at a resolution of 1.40625∘\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^\circ $$\end{document}, the results demonstrate that our proposed HU-Net outperforms other baseline models in both prediction accuracy and inference time.
... Figure 6), while showing unprecedented longterm stability for a year-long rollout. More importantly, a one year-long rollout of the SFNO is computed in 12.8 minutes on a single NVIDIA A6000 GPU, compared to one hour (wall-clock time) for a year-long simulation of IFS on 1000 dual-socket CPU nodes (Bauer et al., 2020). With the caveat of differing hardware, this corresponds to a speedup of close to 5,000x. ...
Preprint
Fourier Neural Operators (FNOs) have proven to be an efficient and effective method for resolution-independent operator learning in a broad variety of application areas across scientific machine learning. A key reason for their success is their ability to accurately model long-range dependencies in spatio-temporal data by learning global convolutions in a computationally efficient manner. To this end, FNOs rely on the discrete Fourier transform (DFT), however, DFTs cause visual and spectral artifacts as well as pronounced dissipation when learning operators in spherical coordinates since they incorrectly assume a flat geometry. To overcome this limitation, we generalize FNOs on the sphere, introducing Spherical FNOs (SFNOs) for learning operators on spherical geometries. We apply SFNOs to forecasting atmospheric dynamics, and demonstrate stable auto\-regressive rollouts for a year of simulated time (1,460 steps), while retaining physically plausible dynamics. The SFNO has important implications for machine learning-based simulation of climate dynamics that could eventually help accelerate our response to climate change.
... As discussed by the authors, the need to rethink climate model simulations is critical as the increasing computational power and high-resolution modeling lead to larger and larger outputs. As an example discussed by Klöwer et al. (2021), the European Centre for Medium-Range Weather Forecasts produces 230 TB of data on a typical day, and this data production is expected to quadruple within the next decade because of the increased spatial resolution of the forecast model (Bauer et al. 2020). In that vein, recent works have studied the use of single and mixed precision in climate modeling in order to tackle large amounts of data while ensuring forecast quality (Váňa et al. 2017;Tintó Prims et al. 2019). ...
Article
Full-text available
We thank the authors for this interesting paper that highlights important ideas and concepts for the future of climate model ensembles and their storage, as well as future uses of stochastic emulators. Stochastic emulators are particularly relevant because of the statistical nature of climate model ensembles, as discussed in previous work of the authors (Castruccio et al. in J Clim 32:8511–8522, 2019; Hu and Castruccio in J Clim 34:8409–8418, 2021). We thank the authors for sharing of some of their data with us in order to illustrate this discussion. In the following, in Sect. 1 we discuss alternative techniques currently used and studied, namely lossy compression and ideas emerging from the climate modeling community, that could feed the discussion on ensemble and storage. In that section, we also present numerical results of compression performed on the data shared by the authors. In Sect. 2, we discuss the current statistical model proposed by the authors and its context. We discuss other potential uses of stochastic emulators in climate and Earth modeling.
... They are orders of magnitude slower than the inference of large NNs. Thus, DA for operational numerical weather prediction is considered a high-performance computing challenge [6]. ...
Article
Full-text available
The outstanding breakthroughs of deep learning in computer vision and natural language processing have been the horn of plenty for many recent developments in the climate sciences. These methodological advances currently find applications to subgrid-scale parameterization, data-driven model error correction, model discovery, surrogate modeling, and many other uses. In this perspective article, I will review recent advances in the field, specifically in the thriving subtopic defined by the intersection of dynamical systems in geosciences, data assimilation, and machine learning, with striking applications to physical model error correction. I will give my take on where we are in the field and why we are there and discuss the key perspectives. I will describe several technical obstacles to implementing these new techniques in a high-dimensional, possibly operational system. I will also discuss open questions about the combined use of data assimilation and machine learning and the short- vs. longer-term representation of the surrogate (i.e., neural network-based) dynamics, and finally about uncertainty quantification in this context.
... Of the above-mentioned methods, we can further comment on the practical scalability of the spherical harmonics and explicit convolution-possibly the two fastest implementations for Equation (A1) for existing global model grids. Following (Chatterjee et al., 2018;Bauer et al., 2020), the computational cost and the scalability of the spectral transforms is limited by the global fast Fourier transform algorithms that are communication bound due to use of all-to-all communications in the global matrix transpose and matrix-matrix multiplications. The explicit convolution methods (Normalized Interpolated Convolution from an Adaptive Subgrid-NICAS) of (Ménétrier and Auligné, 2015) provide one of the fastest implementation of the Equation (A1). ...
Article
Full-text available
Lack of efficient ways to include parameterized error covariance in ensemble‐based local volume solvers (e.g. the local ensemble‐transform Kalman filter – the LETKF) remains an outstanding problem in data assimilation. Here, we describe two new algorithms: GETKF‐OI and LETKF‐OI. These algorithms are similar to the traditional optimal interpolation (OI) algorithm in that they use parameterized error covariance to update each of the local volume solutions. However, unlike the traditional OI that scales poorly as the number of observations increases, the new algorithms achieve linear scalability by using either the observational‐space localization strategy of the traditional LETKF algorithm or the modulated ensembles of the gain‐form (GETKF) algorithm. In our testing with a simple one‐dimensional univariate system, we find that the GETKF‐OI algorithm can recover the exact solution within the truncation bounds of the modulated ensemble and the LETKF‐OI algorithm achieves a close approximation to the exact solution. We also demonstrate how to extend GETKF‐OI algorithm to a toy multivariate system with balance constraints.
... The recent explosion of hardware architectures has pushed many research groups and operational centers around the world to rethink their software apparatus to better exploit the computational resources offered by the new technologies. There is a large consensus that this renovation process should be guided by a reconsideration of the algorithmic framework of weather and climate models (Bauer et al., 2020). Under the light of these premises, we presented the outcome of the theoretical and numerical activity which accompanied the development of a Python library for building flexible and modular Earth system models. ...
Article
Full-text available
Six strategies to couple the dynamical core with physical parameterizations in atmospheric models are analyzed from a numerical perspective. Thanks to a suitably designed theoretical framework featuring a high level of abstraction, the truncation error analysis and the linear stability study are carried out under weak assumptions. Indeed, second‐order conditions are derived which are not influenced either by the specific formulation of the governing equations, nor by the number of parameterizations, nor by the structural design and implementation details of the time‐stepping methods. The theoretical findings are verified on two idealized test beds. Particularly, a hydrostatic model in isentropic coordinates is used for vertical slice simulations of a moist airflow past an isolated mountain. Self‐convergence tests show that the sensitivity of the prognostic variables to the coupling scheme may vary. For those variables (e.g., momentum) whose evolution is mainly driven by the dry dynamics, the truncation error associated with the dynamical core dominates and hides the error due to the coupling. In contrast, the coupling error of moist variables (e.g., the precipitation rate) emerges gradually as the spatio‐temporal resolution increases. Eventually, each coupling scheme tends toward the formal order of accuracy, upon a careful treatment of the grid cell condensation. Indeed, the well‐established saturation adjustment may cap the convergence rate to first order. A prognostic formulation of the condensation and evaporation process is derived from first principles. This solution is shown effective to alleviate the convergence issues in our experiments. Potential implications for a complete forecasting system are discussed.
... by NNs ECMWF's scalability program (Bauer et al., 2020) that is currently adapting existing parametrization schemes to be GPU portable, enabling a GPU comparison in the future. ...
Article
Full-text available
Plain Language Summary The ability of computers to construct models from data (machine learning) has had significant impacts on many areas of science. Here, we use this ability to construct a model of an element of a numerical weather forecasting system. This element captures one physical process in the model, a part of the model that describes the propagation of large‐scale waves through the atmosphere, but the long‐term aim would be to make many models each capturing a process. The goal is that the computer‐generated model will perform the task more efficiently than the existing model. Testing is then carried out to ensure that our computer model performs as accurately as the existing model. This is a challenging step, as learning is carried out over short time periods (seconds), but forecasts need to be accurate over years. Our computer‐generated models produce accurate forecasts on all tested timescales. On current computers, they are not faster, but will be if weather forecasting centers invest in computers with graphics processing units.
... Additional development towards increased model efficiency is the ability to run on accelerated hardware such as Graphics Processing Units (GPUs). An effort towards porting the surface model to run efficiently on GPUs has already started at ECMWF under the umbrella of the Scalability programme [75]. ...
Article
Full-text available
The land-surface developments of the European Centre for Medium-range Weather Forecasts (ECMWF) are based on the Carbon-Hydrology Tiled Scheme for Surface Exchanges over Land (CHTESSEL) and form an integral part of the Integrated Forecasting System (IFS), supporting a wide range of global weather, climate and environmental applications. In order to structure, coordinate and focus future developments and benefit from international collaboration in new areas, a flexible system named ECLand, which would facilitate modular extensions to support numerical weather prediction (NWP) and society-relevant operational services, for example, Copernicus, is presented. This paper introduces recent examples of novel ECLand developments on (i) vegetation; (ii) snow; (iii) soil; (iv) open water/lake; (v) river/inundation; and (vi) urban areas. The developments are evaluated separately with long-range, atmosphere-forced surface offline simulations and coupled land-atmosphere-ocean experiments. This illustrates the benchmark criteria for assessing both process fidelity with regards to land surface fluxes and reservoirs of the water-energy-carbon exchange on the one hand, and on the other hand the requirements of ECMWF’s NWP, climate and atmospheric composition monitoring services using an Earth system assimilation and prediction framework.
... Many centers such as NCEP [34], ECMWF [35] and the UK Met Office [36] use 4D-Var data assimilation The success of the 4D-Var approach is crucially dependent on the construction of accurate tangent-linear and adjoint versions of every model component, especially for the physical parametrization schemes that represent the unresolved part of the model physics. Constructing these linearized models can either be achieved by manually differentiating the nonlinear code or by using an automatic differentiation tool [37]. ...
Preprint
Full-text available
There has been a lot of recent interest in developing hybrid models that couple deterministic numerical model components to statistical model components derived using machine learning techniques. One approach that we follow in this pilot study is to replace an existing computationally expensive deterministic model component with its fast machine-learning-based emulator, leading to the model speed-up and/or improvement. We developed a shallow neural network-based emulator of a complete suite of atmospheric physics parameterizations in NCEP Global Forecast System (GFS) general circulation model (GCM). The suite emulated by a single NN includes radiative transfer, cloud macro- and micro-physics, shallow and deep convection, boundary layer processes, gravity wave drag, land model, etc. NCEP GFS with the neural network replacing the original suite of atmospheric parameterizations produces stable and realistic medium range weather forecasts for 24 initial conditions spanning all months of 2018. It also remains stable in a year-long AMIP-like experiment and in the run with a quadrupled horizontal resolution. We present preliminary results of parallel runs, evaluating the accuracy and speedup of the resulting hybrid GCM.
... Thanks to increases in computing power and advances in scalability (e.g., Bauer et al., 2020), storm-resolving simulations with 3-5 km grid spacing in which deep convection is explicitly simulated (but not necessarily fully resolved) have become possible. A comprehensive overview of the history of global storm-resolving models can be found in Satoh et al. (2019), and only a few examples are highlighted here. ...
Article
Full-text available
Abstract In an attempt to advance the understanding of the Earth's weather and climate by representing deep convection explicitly, we present a global, four‐month simulation (November 2018 to February 2019) with ECMWF's hydrostatic Integrated Forecasting System (IFS) at an average grid spacing of 1.4 km. The impact of explicitly simulating deep convection on the atmospheric circulation and its variability is assessed by comparing the 1.4 km simulation to the equivalent well‐tested and calibrated global simulations at 9 km grid spacing with and without parametrized deep convection. The explicit simulation of deep convection at 1.4 km results in a realistic large‐scale circulation, better representation of convective storm activity, and stronger convective gravity wave activity when compared to the 9 km simulation with parametrized deep convection. Comparison of the 1.4 km simulation to the 9 km simulation without parametrized deep convection shows that switching off deep convection parametrization at a too coarse resolution (i.e., 9 km) generates too strong convective gravity waves. Based on the limited statistics available, improvements to the Madden‐Julian Oscillation or tropical precipitation are not observed at 1.4 km, suggesting that other Earth system model components and/or their interaction are important for an accurate representation of these processes and may well need adjusting at deep convection resolving resolutions. Overall, the good agreement of the 1.4 km simulation with the 9 km simulation with parametrized deep convection is remarkable, despite one of the most fundamental parametrizations being turned off at 1.4 km resolution and despite no adjustments being made to the remaining parametrizations.
... One reason for this is that the concept requires the observational data processing software to accommodate continuous streams of data. At ECMWF this has been a major undertaking over recent years (e.g., Bauer et al., 2020). Nevertheless, the concept has the potential to allow modern DA systems to operate more continuously and to better utilize the present form of the GOS. ...
Article
Full-text available
A new configuration of the European Centre for Medium‐Range Weather Forecasts (ECMWF) incremental 4D‐Var data assimilation (DA) system is introduced which builds upon the quasi‐continuous DA concept proposed in the mid‐1990s. Rather than working with a fixed set of observations, the new 4D‐Var configuration exploits the near‐continuous stream of incoming observations by introducing recently arrived observations at each outer loop iteration of the assimilation. This allows the analysis to benefit from more recent observations. Additionally, by decoupling the start time of the DA calculations from the observational data cut‐off time, real‐time forecasting applications can benefit from more expensive analysis configurations that previously could not have been considered. In this work we present results of a systematic comparison of the performance of a Continuous DA system against that of two more traditional baseline 4D‐Var configurations. We show that the quality of the analysis produced by the new, more continuous configuration is comparable to that of a conventional baseline that has access to all of the observations in each of the outer loops, which is a configuration not feasible in real‐time operational numerical weather prediction. For real‐time forecasting applications, the Continuous DA framework allows configurations which clearly outperform the best available affordable non‐continuous configuration. Continuous DA became operational at ECMWF in June 2019 and led to significant 2 to 3% reductions in medium‐range forecast root mean square errors, which is roughly equivalent to 2–3 hr of additional predictive skill.
Article
The emergence of exascale computing and artificial intelligence offer tremendous potential to significantly advance earth system prediction capabilities. However, enormous challenges must be overcome to adapt models and prediction systems to use these new technologies effectively. A recent WMO report on exascale computing recommends “ urgency in dedicating efforts and attention to disruptions associated with evolving computing technologies that will be increasingly difficult to overcome, threatening continued advancements in weather and climate prediction capabilities. Further, the explosive growth in data from observations, model and ensemble output, and post processing threatens to overwhelm the ability to deliver timely, accurate, and precise information needed for decision making. AI offers untapped opportunities to alter how models are developed, observations are processed, and predictions are analyzed and extracted for decision-making. Given the extraordinarily high cost of computing, growing complexity of prediction systems and increasingly unmanageable amount of data being produced and consumed, these challenges are rapidly becoming too large for any single institution or country to handle. This paper describes key technical, and budgetary challenges, identifies gaps and ways to address them, and makes a number of recommendations.
Preprint
Full-text available
Earth system modeling and prediction stands at a crossroads. Exascale computing and artificial intelligence offer powerful new capabilities to advance earth system predictions. However, models, assimilation and data processing systems are increasingly unable to exploit these new technologies due to scientific, software, and computational limitations. Significant changes to the models including algorithms, software and parallelism are needed to run models efficiently on diverse exascale systems. While the rapidly emerging field of artificial intelligence offers significant potential, it remains unclear the extent such technologies can be applied. A recent WMO report on exascale computing recommends "urgency in dedicating efforts and attention to disruptions associated with evolving computing technologies that will be increasingly difficult to overcome, threatening continued advancements in weather and climate prediction capabilities. Further, the explosive growth in data from observations, model and ensemble output, and post processing threatens to overwhelm the ability to deliver timely, accurate, and precise information needed for decision making. Given the extraordinarily high cost of computing, growing complexity of prediction systems and increasingly unmanageable data being produced and consumed, these challenges are rapidly becoming too large for any single institution or country to handle. This paper describes key technical and budgetary challenges, identifies gaps and ways to address them, and makes a number of recommendations.
Article
Full-text available
Weather forecasting is important for science and society. At present, the most accurate forecast system is the numerical weather prediction (NWP) method, which represents atmospheric states as discretized grids and numerically solves partial differential equations that describe the transition between those states¹. However, this procedure is computationally expensive. Recently, artificial-intelligence-based methods² have shown potential in accelerating weather forecasting by orders of magnitude, but the forecast accuracy is still significantly lower than that of NWP methods. Here we introduce an artificial-intelligence-based method for accurate, medium-range global weather forecasting. We show that three-dimensional deep networks equipped with Earth-specific priors are effective at dealing with complex patterns in weather data, and that a hierarchical temporal aggregation strategy reduces accumulation errors in medium-range forecasting. Trained on 39 years of global data, our program, Pangu-Weather, obtains stronger deterministic forecast results on reanalysis data in all tested variables when compared with the world’s best NWP system, the operational integrated forecasting system of the European Centre for Medium-Range Weather Forecasts (ECMWF)³. Our method also works well with extreme weather forecasts and ensemble forecasts. When initialized with reanalysis data, the accuracy of tracking tropical cyclones is also higher than that of ECMWF-HRES.
Article
FourCastNet, short for Fourier Forecasting Neural Network, is a global data-driven weather forecasting model that provides accurate short to medium-range global predictions at 0.25∘ resolution. FourCastNet accurately forecasts high-resolution, fast-timescale variables such as the surface wind speed, precipitation, and atmospheric water vapor. It has important implications for planning wind energy resources, predicting extreme weather events such as tropical cyclones, extra-tropical cyclones, and atmospheric rivers. FourCastNet matches the forecasting accuracy of the ECMWF Integrated Forecasting System (IFS), a state-of-the-art Numerical Weather Prediction (NWP) model, at short lead times for large-scale variables, while outperforming IFS for variables with complex fine-scale structure, including precipitation. FourCastNet generates a week-long forecast in less than 2 seconds, orders of magnitude faster than IFS. The speed of FourCastNet enables the creation of rapid and inexpensive large-ensemble forecasts with thousands of ensemble-members for improving probabilistic forecasting. We discuss how data-driven deep learning models such as FourCastNet are a valuable addition to the meteorology toolkit to aid and augment NWP models.
Preprint
FourCastNet, short for Fourier Forecasting Neural Network, is a global data-driven weather forecasting model that provides accurate short to medium-range global predictions at $0.25^{\circ}$ resolution. FourCastNet accurately forecasts high-resolution, fast-timescale variables such as the surface wind speed, precipitation, and atmospheric water vapor. It has important implications for planning wind energy resources, predicting extreme weather events such as tropical cyclones, extra-tropical cyclones, and atmospheric rivers. FourCastNet matches the forecasting accuracy of the ECMWF Integrated Forecasting System (IFS), a state-of-the-art Numerical Weather Prediction (NWP) model, at short lead times for large-scale variables, while outperforming IFS for variables with complex fine-scale structure, including precipitation. FourCastNet generates a week-long forecast in less than 2 seconds, orders of magnitude faster than IFS. The speed of FourCastNet enables the creation of rapid and inexpensive large-ensemble forecasts with thousands of ensemble-members for improving probabilistic forecasting. We discuss how data-driven deep learning models such as FourCastNet are a valuable addition to the meteorology toolkit to aid and augment NWP models.
Article
Full-text available
Reducing the numerical precision of the forecast model of the Integrated Forecasting System (IFS) of the European Centre for Medium‐Range Weather Forecasts (ECMWF) from double to single precision results in significant computational savings without negatively affecting forecast accuracy. The computational savings allow to increase the vertical resolution of the operational ensemble forecasts from 91 to 137 levels earlier than anticipated and before the next upgrade of ECMWF's high‐performance computing facility. This upgrade to 137 levels harmonises the vertical resolution of the medium‐range deterministic forecasts and the medium‐range and extended‐range ensemble forecasts. Increasing the vertical resolution of the ensemble forecasts substantially improves forecast skill for all lead times as well as the mean of the model climate. ECMWF's ensemble and deterministic forecasts will run operationally at single precision from IFS model cycle 47R2 onwards.
Preprint
Full-text available
Hundreds of petabytes of data are produced annually at weather and climate forecast centres worldwide. Compression is inevitable to reduce storage and to facilitate data sharing. Current techniques do not distinguish the real from the false information in data. We define the bitwise real information content from information theory for data from the Copernicus Atmospheric Monitoring Service (CAMS). Most variables contain less than 7 bits of real information per value, which are also highly compressible due to spatio-temporal correlation. Rounding bits without real information to zero facilitates lossless compression algorithms and encodes the uncertainty within the data itself. The entire CAMS data is compressed by a factor of 17x, relative to 64-bit floats, while preserving 99% of real information. Combined with 4-dimensional compression to exploit the spatio-temporal correlation, factors beyond 60x are achieved without an increase in forecast errors. A data compression Turing test is proposed to optimize compressibility while minimizing information loss for the end use of weather and climate forecast data.
Thesis
Full-text available
The skill of weather forecasts has improved dramatically over the past 30 years. This improvement has depended to a large degree on developments in supercomputing, which have allowed models to increase in complexity and resolution with minimal technical effort. However, the nature of supercomputing is undergoing a significant change, with the advent of extremely parallel and heterogeneous architectures. This paradigm shift threatens the continual increase of forecast skill and prompts a reevaluation of how Earth-System models are developed. In this thesis we explore the notion of reduced-precision arithmetic to accelerate Earth-System models, specifically those used in data assimilation and in numerical weather prediction. We first conduct data assimilation experiments with the Lorenz '96 toy atmospheric system, using the ensemble Kalman filter to perform assimilation. We reduce precision in the forecast and analysis steps of the ensemble Kalman filter and measure how this affects the quality of the data assimilation product, the analysis. We find that the optimal choice of precision is intimately linked with the degree of uncertainty from noisy observations and infrequent assimilation. We also find that precision can be traded for more ensemble members, and that this trade-off delivers a more accurate analysis than otherwise. We then consider the SPEEDY intermediate complexity atmospheric general circulation model, again with the ensemble Kalman filter. In this case we find that, in a perfect model setting, reducing precision in the forecast model gives an unacceptable degradation in the data assimilation product. However, we then show that even a modest degree of model error can mask the errors introduced by reducing precision. We consider also a precision reduction in the 4D-Var data assimilation scheme. We find that reducing precision increases the asymmetry between the tangent-linear and adjoint models, and that this retards the convergence of the minimisation scheme. However, with a standard reorthogonalisation procedure we are able to use single-precision, and even lower levels of precision, successfully. Finally, we consider the use of reduced-precision arithmetic to accelerate the Legendre transforms of an operational global weather forecasting model. We find that, with a few considerations of the algorithmic structure of the transforms and the physical meaning of the different components, we are able to use even half-precision without affecting the forecast skill of the model. In conclusion, we find that the errors introduced by reducing precision are negligible with respect to inherent errors in the forecasting system. In order to make optimal use of future supercomputers, reduced-precision arithmetic will be key.
Article
Full-text available
Mixed-precision approaches can provide substantial speed-ups for both computing- and memory-bound codes with little effort. Most scientific codes have overengineered the numerical precision, leading to a situation in which models are using more resources than required without knowing where they are required and where they are not. Consequently, it is possible to improve computational performance by establishing a more appropriate choice of precision. The only input that is needed is a method to determine which real variables can be represented with fewer bits without affecting the accuracy of the results. This paper presents a novel method that enables modern and legacy codes to benefit from a reduction of the precision of certain variables without sacrificing accuracy. It consists of a simple idea: we reduce the precision of a group of variables and measure how it affects the outputs. Then we can evaluate the level of precision that they truly need. Modifying and recompiling the code for each case that has to be evaluated would require a prohibitive amount of effort. Instead, the method presented in this paper relies on the use of a tool called a reduced-precision emulator (RPE) that can significantly streamline the process. Using the RPE and a list of parameters containing the precisions that will be used for each real variable in the code, it is possible within a single binary to emulate the effect on the outputs of a specific choice of precision. When we are able to emulate the effects of reduced precision, we can proceed with the design of the tests that will give us knowledge of the sensitivity of the model variables regarding their numerical precision. The number of possible combinations is prohibitively large and therefore impossible to explore. The alternative of performing a screening of the variables individually can provide certain insight about the required precision of variables, but, on the other hand, other complex interactions that involve several variables may remain hidden. Instead, we use a divide-and-conquer algorithm that identifies the parts that require high precision and establishes a set of variables that can handle reduced precision. This method has been tested using two state-of-the-art ocean models, the Nucleus for European Modelling of the Ocean (NEMO) and the Regional Ocean Modeling System (ROMS), with very promising results. Obtaining this information is crucial to build an actual mixed-precision version of the code in the next phase that will bring the promised performance benefits.
Conference Paper
Full-text available
The next generation of weather and climate models will have an unprecedented level of resolution and model complexity, and running these models efficiently will require taking advantage of future supercomputers and heterogeneous hardware. In this paper, we investigate the use of mixed-precision hardware that supports floating-point operations at double-, single-and half-precision. In particular, we investigate the potential use of the NVIDIA Tensor Core, a mixed-precision matrix-matrix multiplier mainly developed for use in deep learning, to accelerate the calculation of the Legendre transforms in the Integrated Forecasting System (IFS), one of the leading global weather forecast models. In the IFS, the Legendre transform is one of the most expensive model components and dominates the computational cost for simulations at a very high resolution. We investigate the impact of mixed-precision arithmetic in IFS simulations of operational complexity through software emulation. Through a targeted but minimal use of double-precision arithmetic we are able to use either half-precision arithmetic or mixed half/single-precision arithmetic for almost all of the calculations in the Legendre transform without affecting forecast skill.
Article
Full-text available
Atmospheric chemistry models are a central tool to study the impact of chemical constituents on the environment, vegetation and human health. These models are numerically intense, and previous attempts to reduce the numerical cost of chemistry solvers have not delivered transformative change. We show here the potential of a machine learning (in this case random forest regression) replacement for the gas-phase chemistry in atmospheric chemistry transport models. Our training data consist of 1 month (July 2013) of output of chemical conditions together with the model physical state, produced from the GEOS-Chem chemistry model v10. From this data set we train random forest regression models to predict the concentration of each transported species after the integrator, based on the physical and chemical conditions before the integrator. The choice of prediction type has a strong impact on the skill of the regression model. We find best results from predicting the change in concentration for long-lived species and the absolute concentration for short-lived species. We also find improvements from a simple implementation of chemical families (NOx = NO + NO2). We then implement the trained random forest predictors back into GEOS-Chem to replace the numerical integrator. The machine-learning-driven GEOS-Chem model compares well to the standard simulation. For ozone (O3), errors from using the random forests (compared to the reference simulation) grow slowly and after 5 days the normalized mean bias (NMB), root mean square error (RMSE) and R² are 4.2 %, 35 % and 0.9, respectively; after 30 days the errors increase to 13 %, 67 % and 0.75, respectively. The biases become largest in remote areas such as the tropical Pacific where errors in the chemistry can accumulate with little balancing influence from emissions or deposition. Over polluted regions the model error is less than 10 % and has significant fidelity in following the time series of the full model. Modelled NOx shows similar features, with the most significant errors occurring in remote locations far from recent emissions. For other species such as inorganic bromine species and short-lived nitrogen species, errors become large, with NMB, RMSE and R² reaching >2100 % >400 % and <0.1, respectively. This proof-of-concept implementation takes 1.8 times more time than the direct integration of the differential equations, but optimization and software engineering should allow substantial increases in speed. We discuss potential improvements in the implementation, some of its advantages from both a software and hardware perspective, its limitations, and its applicability to operational air quality activities.
Article
Full-text available
A study of the scalability of the Finite-volumE Sea ice-Ocean circulation Model, Version 2.0 (FESOM2), the first mature global model of its kind formulated on unstructured meshes, is presented. This study includes an analysis of main computational kernels with a special focus on bottlenecks in parallel scalability. Several model enhancements, improving this scalability for large numbers of processes, are described and tested. Model grids at different resolutions are used on four HPC systems with differing computation and communication hardware to demonstrate model's scalability and throughput. Furthermore, strategies for improvements in parallel performance are presented and assessed. We show that in terms of throughput FESOM2.0 is on par with the state-of-the-art structured ocean models and in realistic eddy resolving configuration (1/10° resolution) can produce about 16 years per day on 14 000 cores. This suggests that unstructured-mesh models are becoming extremely competitive tools in high-resolution climate modelling. It is shown that main bottlenecks of FESOM parallel scalability are the two-dimensional components of the model, namely the computations of external (barotropic) mode and the sea-ice model. It is argued that these bottlenecks are shared with other general ocean circulation models.
Article
Full-text available
This paper describes the splitting supercell idealized test case used in the 2016 Dynamical Core Model Intercomparison Project (DCMIP2016). These storms are useful test beds for global atmospheric models because the horizontal scale of convective plumes is O(1 km), emphasizing non-hydrostatic dynamics. The test case simulates a supercell on a reduced-radius sphere with nominal resolutions ranging from 4 to 0.5 km and is based on the work of Klemp et al. (2015). Models are initialized with an atmospheric environment conducive to supercell formation and forced with a small thermal perturbation. A simplified Kessler microphysics scheme is coupled to the dynamical core to represent moist processes. Reference solutions for DCMIP2016 models are presented. Storm evolution is broadly similar between models, although differences in the final solution exist. These differences are hypothesized to result from different numerical discretizations, physics–dynamics coupling, and numerical diffusion. Intramodel solutions generally converge as models approach 0.5 km resolution, although exploratory simulations at 0.25 km imply some dynamical cores require more refinement to fully converge. These results can be used as a reference for future dynamical core evaluation, particularly with the development of non-hydrostatic global models intended to be used in convective-permitting regimes.
Article
Full-text available
Recent progress in machine learning has shown how to forecast and, to some extent, learn the dynamics of a model from its output, resorting in particular to neural networks and deep learning techniques. We will show how the same goal can be directly achieved using data assimilation techniques without leveraging on machine learning software libraries, with a view to high-dimensional models. The dynamics of a model are learned from its observation and an ordinary differential equation (ODE) representation of this model is inferred using a recursive nonlinear regression. Because the method is embedded in a Bayesian data assimilation framework, it can learn from partial and noisy observations of a state trajectory of the physical model. Moreover, a space-wise local representation of the ODE system is introduced and is key to cope with high-dimensional models. It has recently been suggested that neural network architectures could be interpreted as dynamical systems. Reciprocally, we show that our ODE representations are reminiscent of deep learning architectures. Furthermore, numerical analysis considerations on stability shed light on the assets and limitations of the method. The method is illustrated on several chaotic discrete and continuous models of various dimensions, with or without noisy observations, with the goal to identify or improve the model dynamics, build a surrogate or reduced model, or produce forecasts from mere observations of the physical model.
Article
Full-text available
We present a nonhydrostatic finite-volume global atmospheric model formulation for numerical weather prediction with the Integrated Forecasting System (IFS) at ECMWF and compare it to the established operational spectral-transform formulation. The novel Finite-Volume Module of the IFS (henceforth IFS-FVM) integrates the fully compressible equations using semi-implicit time stepping and non-oscillatory forward-in-time (NFT) Eulerian advection, whereas the spectral-transform IFS solves the hydrostatic primitive equations (optionally the fully compressible equations) using a semi-implicit semi-Lagrangian scheme. The IFS-FVM complements the spectral-transform counterpart by means of the finite-volume discretization with a local low-volume communication footprint, fully conservative and monotone advective transport, all-scale deep-atmosphere fully compressible equations in a generalized height-based vertical coordinate, and flexible horizontal meshes. Nevertheless, both the finite-volume and spectral-transform formulations can share the same quasi-uniform horizontal grid with co-located arrangement of variables, geospherical longitude-latitude coordinates, and physics parameterizations, thereby facilitating their comparison, coexistence, and combination in the IFS. We highlight the advanced semi-implicit NFT finite-volume integration of the fully compressible equations of IFS-FVM considering comprehensive moist-precipitating dynamics with coupling to the IFS cloud parameterization by means of a generic interface. These developments - including a new horizontal-vertical split NFT MPDATA advective transport scheme, variable time stepping, effective preconditioning of the elliptic Helmholtz solver in the semi-implicit scheme, and a computationally efficient implementation of the median-dual finite-volume approach - provide a basis for the efficacy of IFS-FVM and its application in global numerical weather prediction. Here, numerical experiments focus on relevant dry and moist-precipitating baroclinic instability at various resolutions. We show that the presented semi-implicit NFT finite-volume integration scheme on co-located meshes of IFS-FVM can provide highly competitive solution quality and computational performance to the proven semi-implicit semi-Lagrangian integration scheme of the spectral-transform IFS.
Article
Full-text available
Variational data assimilations methods are reviewed and compared in the Met Office global numerical weather prediction system. This supports hybrid background error covariances which are a weighted combination of modelled static “climatological” covariances with covariances calculated from a current ensemble of forecasts, in both 3‐dimensional and 4‐dimensional methods. For the latter, we compare the use of linear and adjoint models (hybrid‐4DVar) with the direct use of ensemble forecast trajectories (hybrid‐4DEnVar). Earlier studies had shown that hybrid‐4DVar outperforms hybrid‐4DEnVar, and 4DVar outperforms 3DVar. Improvements in the processing of ensemble covariances and computer enhancements mean we are now able to explore these comparisons for the full range of hybrid weights. We find that, using our operational 44 member ensemble, the static covariance is still beneficial in hybrid‐4DVar, so that it significantly outperforms hybrid‐4DEnVar. In schemes not using linear and adjoint models, the static covariance is less beneficial. It is shown that the time‐propagated static covariance is the main cause of the better performance of 4DVar; when using pure ensemble covariances, 4DVar and 4DEnVar show similar skill. These results are consistent with nonlinear dynamics theory about assimilation in the unstable sub‐space. This article is protected by copyright. All rights reserved.
Article
Full-text available
The use of reduced numerical precision within an atmospheric data assimilation system is investigated. An atmospheric model with a spectral dynamical core is used to generate synthetic observations, which are then assimilated back into the same model using an ensemble Kalman filter. The effect on the analysis error of reducing precision from 64 bits to only 22 bits is measured and found to depend strongly on the degree of model uncertainty within the system. When the model used to generate the observations is identical to the model used to assimilate observations, the reduced-precision results suffer substantially. However, when model error is introduced by changing the diffusion scheme in the assimilation model or by using a higher-resolution model to generate observations, the difference in analysis quality between the two levels of precision is almost eliminated. Lower-precision arithmetic has a lower computational cost, so lowering precision could free up computational resources in operational data assimilation and allow an increase in ensemble size or grid resolution.
Article
Full-text available
The Korea Institute of Atmospheric Prediction Systems (KIAPS) began a national project to develop a new global atmospheric model system in 2011. The ultimate goal of this 9-year project is to replace the current operational model at the Korea Meteorological Administration (KMA), which was adopted from the United Kingdom’s Meteorological Office’s unified model (UM) in 2010. The 12-km Korean Integrated Model (KIM) system, consisting of a spectral-element non-hydrostatic dynamical core on a cubed sphere grid and a state-of-the-art physics parameterization package, has been launched in a real-time forecast framework, with initial conditions obtained via the advanced hybrid four-dimensional ensemble variational data assimilation (4DEnVar) over its native grid. A development strategy for KIM and the evolution of its performance in medium-range forecasts toward a world-class global forecast system are described. Outstanding issues in KIM 3.1 as of February 2018 are discussed, along with a future plan for operational deployment in 2020.
Article
Full-text available
Can models that are based on deep learning and trained on atmospheric data compete with weather and climate models that are based on physical principles and the basic equations of motion? This question has been asked often recently due to the boom of deep learning techniques. The question is valid given the huge amount of data that is available, the computational efficiency of deep learning techniques and the limitations of today's weather and climate models in particular with respect to resolution and complexity. In this paper, the question will be discussed in the context of global weather forecasts. A toy-model for global weather predictions will be presented and used to identify challenges and fundamental design choices for a forecast system based on Neural Networks.
Article
Full-text available
Atmospheric dynamical cores are a fundamental component of global atmospheric modeling systems and are responsible for capturing the dynamical behavior of the Earth's atmosphere via numerical integration of the Navier–Stokes equations. These systems have existed in one form or another for over half of a century, with the earliest discretizations having now evolved into a complex ecosystem of algorithms and computational strategies. In essence, no two dynamical cores are alike, and their individual successes suggest that no perfect model exists. To better understand modern dynamical cores, this paper aims to provide a comprehensive review of 11 non-hydrostatic dynamical cores, drawn from modeling centers and groups that participated in the 2016 Dynamical Core Model Intercomparison Project (DCMIP) workshop and summer school. This review includes a choice of model grid, variable placement, vertical coordinate, prognostic equations, temporal discretization, and the diffusion, stabilization, filters, and fixers employed by each system.
Article
Full-text available
This paper presents an application of GPU accelerators in Earth system modeling. We focus on atmospheric chemical kinetics, one of the most computationally intensive tasks in climate–chemistry model simulations. We developed a software package that automatically generates CUDA kernels to numerically integrate atmospheric chemical kinetics in the global climate model ECHAM/MESSy Atmospheric Chemistry (EMAC), used to study climate change and air quality scenarios. A source-to-source compiler outputs a CUDA-compatible kernel by parsing the FORTRAN code generated by the Kinetic PreProcessor (KPP) general analysis tool. All Rosenbrock methods that are available in the KPP numerical library are supported. Performance evaluation, using Fermi and Pascal CUDA-enabled GPU accelerators, shows achieved speed-ups of 4. 5 × and 20. 4 × , respectively, of the kernel execution time. A node-to-node real-world production performance comparison shows a 1. 75 × speed-up over the non-accelerated application using the KPP three-stage Rosenbrock solver. We provide a detailed description of the code optimizations used to improve the performance including memory optimizations, control code simplification, and reduction of idle time. The accuracy and correctness of the accelerated implementation are evaluated by comparing to the CPU-only code of the application. The median relative difference is found to be less than 0.000000001 % when comparing the output of the accelerated kernel the CPU-only code. The approach followed, including the computational workload division, and the developed GPU solver code can potentially be used as the basis for hardware acceleration of numerous geoscientific models that rely on KPP for atmospheric chemical kinetics applications.
Article
Full-text available
The best hope for reducing long-standing global climate model biases, is through increasing the resolution to the kilometer scale. Here we present results from an ultra-high resolution non-hydrostatic climate model for a near-global setup running on the full Piz Daint supercomputer on 4888 GPUs. The dynamical core of the model has been completely rewritten using a domain-specific language (DSL) for performance portability across different hardware architectures. Physical parameterizations and diagnostics have been ported using compiler directives. To our knowledge this represents the first complete atmospheric model being run entirely on accelerators at this scale. At a grid spacing of 930 m (1.9 km), we achieve a simulation throughput of 0.043 (0.23) simulated years per day and an energy consumption of 596 MWh per simulated year. Furthermore, we propose the new memory usage efficiency metric that considers how efficiently the memory bandwidth – the dominant bottleneck of climate codes – is being used.
Article
Full-text available
The algorithms underlying numerical weather prediction (NWP) and climate models that have been developed in the past few decades face an increasing challenge caused by the paradigm shift imposed by hardware vendors towards more energy-efficient devices. In order to provide a sustainable path to exascale High Performance Computing (HPC), applications become increasingly restricted by energy consumption. As a result, the emerging diverse and complex hardware solutions have a large impact on the programming models traditionally used in NWP software, triggering a rethink of design choices for future massively parallel software frameworks. In this paper, we present Atlas , a new software library that is currently being developed at the European Centre for Medium-Range Weather Forecasts (ECMWF), with the scope of handling data structures required for NWP applications in a flexible and massively parallel way. Atlas provides a versatile framework for the future development of efficient NWP and climate applications on emerging HPC architectures. The applications range from full Earth system models, to specific tools required for post-processing weather forecast products. The Atlas library thus constitutes a step towards affordable exascale high-performance simulations by providing the necessary abstractions that facilitate the application in heterogeneous HPC environments by promoting the co-design of NWP algorithms with the underlying hardware.
Article
Full-text available
Stencil computations occur in a multitude of scientific simulations and therefore have been the subject of many domain-specific languages including the OPS (Oxford Parallel library for Structured meshes) DSL embedded in C/C++/Fortran. OPS is currently used in several large partial differential equations (PDE) applications, and has been used as a vehicle to experiment with, and deploy performance improving optimisations. The key common bottleneck in most stencil codes is data movement, and other research has shown that improving data locality through optimisations that schedule across loops do particularly well. However, in many large PDE applications it is not possible to apply such optimisations through a compiler because in larger-scale codes, there are a huge number of options, execution paths and data per grid point, many dependent on run-time parameters, and the code is distributed across a number of different compilation units. In this paper, we adapt the data locality improving optimisation called iteration space slicing for use in large OPS apps, relying on run-time analysis and delayed execution. We observe speedups of 2x on the Cloverleaf 2D/3D proxy application, which contain 83/141 loops respectively. The approach is generally applicable to any stencil DSL that provides per loop data access information.
Article
Full-text available
The use of reduced numerical precision to reduce computing costs for the cloud resolving model of superparameterised simulations of the atmosphere is investigated. An approach to identify the optimal level of precision for many different model components is presented and a detailed analysis of precision is performed. This is non-trivial for a complex model that shows chaotic behaviour such as the cloud resolving model in this paper. It is shown not only that numerical precision can be reduced significantly but also that the results of the reduced precision analysis provide valuable information for the quantification of model uncertainty for individual model components. The precision analysis is also used to identify model parts that are of less importance thus enabling a reduction of model complexity. It is shown that the precision analysis can be used to improve model efficiency for both simulations in double precision and in reduced precision. Model simulations are performed with a superparametrised single-column model version of the OpenIFS model that is forced by observational datasets. A software emulator was used to mimic the use of reduced precision floating point arithmetic in simulations. This article is protected by copyright. All rights reserved.
Article
Full-text available
A study is made of the limits imposed on variational assimilation of observations by the chaotic character of the atmospheric flow. The primary goal of the study is to determine to which degree, and how, the knowledge of past noisy observations can improve the knowledge of the present state of a chaotic system. The study is made under the hypothesis of a perfect model. Theoretical results are illustrated by numerical experiments performed with the classical three-variable system introduced by Lorenz. Both theoretical and numerical results show that, even in the chaotic regime, appropriate use of past observations improves the accuracy on the estimate of the present state of the flow. However, the resulting estimation error mostly projects onto the unstable modes of the system, and the corresponding gain in predictability is limited. Theoretical considerations provide explicit estimates of the statistics of the assimilation error. The error depends on the state of the flow over the assimilation period. It is largest when there has been a period of strong instability in the very recent past. In the limit of infinitely long assimilation periods, the behaviour of the cost-function of variational assimilation is singular: it tends to fold into deep narrow “valleys” parallel to the sheets of the unstable manifold of the system. An unbounded number of secondary minima appear, where solutions of minimization algorithms can be trapped. The absolute minimum of the cost-function always lies on the sheet of the unstable manifold containing the exact state of the flow. But the error along the unstable manifold saturates to a finite value, and the absolute minimum of the cost function does not, in general, converge to the exact state of the flow. Even so, the absolute minimum of the cost function is the best estimate that can be obtained of the state of the flow. An algorithm is proposed, the quasi-static variational assimilation , for determining the absolute minimum, based on successive small increments of the assimilation period and quasi-static adjustments of the minimizing solution. Finally, the impact of assimilation on predictability is assessed by forecast experiments with that system. The ability of the present paper lies mainly in the qualitative results it presents. Qualitative estimates relevant for the atmosphere call for further studies. DOI: 10.1034/j.1600-0870.1996.00006.x
Article
Full-text available
A new global model with a non-hydrostatic (NH) dynamical core is developed. It employs the spectral element method (SEM) in the horizontal discretization and the finite difference method (FDM) in the vertical discretization. The solver includes a time-split third-order Runge-Kutta (RK3) time-integration technique. Pursuing the quasi-uniform and pole singularity-free spherical geometry, a cubed-sphere grid is employed. To assess the performance of the developed dynamical solver, the results from a number of idealized benchmark tests for hydrostatic and non-hydrostatic flows are presented and compared. The results indicate that the non-hydrostatic dynamical solver is able to produce solutions with good accuracy and consistency comparable to reference solutions. Further evaluation of the model with a full-physics package demonstrates its capability in reproducing heavy rainfall over the Korean Peninsula, which confirms that coupling of the dynamical solver and full-physics package is robust.
Article
Full-text available
The steady path of doubling the global horizontal resolution approximately every 8 years in numerical weather prediction (NWP) at the European Centre for Medium Range Weather Forecasts may be substan- tially altered with emerging novel computing architectures. It coincides with the need to appropriately address and determine forecast uncertainty with increasing resolution, in particular, when convective-scale motions start to be resolved. Blunt increases in the model resolution will quickly become unaffordable and may not lead to improved NWP forecasts. Consequently, there is a need to accordingly adjust proven numerical techniques. An informed decision on the modelling strategy for harnessing exascale, massively parallel computing power thus also requires a deeper understanding of the sensitivity to uncertainty-for each part of the model-and ultimately a deeper understanding of multi-scale interactions in the atmosphere and their numerical realization in ultra-high-resolution NWP and climate simulations. This paper explores opportunities for substantial increases in the forecast efficiency by judicious adjustment of the formal accuracy or relative resolution in the spectral and physical space. One path is to reduce the formal accuracy by which the spectral transforms are computed. The other pathway explores the importance of the ratio used for the horizontal resolution in gridpoint space versus wavenumbers in spectral space. This is relevant for both high-resolution simulations as well as ensemble-based uncertainty estimation.
Article
Full-text available
We now have 20 years of data under our belt about the performance of supercomputers against at least a single floating-point benchmark from dense linear algebra. Until about 2004, a single model of parallel programming, bulk synchronous using the MPI model, was sufficient to permit translation into reasonable parallel programs for more complex applications. Starting in 2004, however, a confluence of events changed forever the architectural landscape that underpinned MPI. The first half of this article goes into the underlying reasons for these changes, and what they mean for system architectures. The second half then addresses the view going forward in terms of our standard scaling models and their profound implications for future programming and algorithm design.
Article
Full-text available
Empirical or statistical methods have been introduced into meteorology and oceanography in four distinct stages: 1) linear regression (and correlation), 2) principal component analysis (PCA), 3) canonical correlation analysis, and recently 4) neural network (NN) models. Despite the great popularity of the NN models in many fields, there are three obstacles to adapting the NN method to meteorology-oceanography, especially in large-scale, low-frequency studies: (a) nonlinear instability with short data records, (b) large spatial data fields, and (c) difficulties in interpreting the nonlinear NN results. Recent research shows that these three obstacles can be overcome. For obstacle (a), ensemble averaging was found to be effective in controlling nonlinear instability. For (b), the PCA method was used as a prefilter for compressing the large spatial data fields. For (c), the mysterious hidden layer could be given a phase space interpretation, and spectral analysis aided in understanding the nonlinear NN relations. With these and future improvements, the nonlinear NN method is evolving to a versatile and powerful technique capable of augmenting traditional linear statistical methods in data analysis and forecasting; for example, the NN method has been used for El Niño prediction and for nonlinear PCA. The NN model is also found to be a type of variational (adjoint) data assimilation, which allows it to be readily linked to dynamical models under adjoint data assimilation, resulting in a new class of hybrid neural-dynamical models.
Article
Full-text available
The authors have investigated the possibility of elaborating a new generation of radiative transfer models for climate studies based on the neural network technique. The authors show that their neural network-based model, NeuroFlux, can be used successfully for accurately deriving the longwave radiative budget from the top of the atmosphere to the surface. The reliable sampling of the earth's atmospheric situations in the new version of the TIGR (Thermodynamic Initial Guess Retrieval) dataset, developed at the Laboratoire de Météorologie Dynamique, allows for an efficient learning of the neural networks. Two radiative transfer models are applied to the computation of the radiative part of the dataset: a line-by-line model and a band model. These results have been used to infer the parameters of two neural network-based radiative transfer codes. Both of them achieve an accuracy comparable to, if not better than, the current general circulation model radiative transfer codes, and they are much faster. The dramatic saving of computing time based on the neural network technique (22 times faster compared with the band model), 106 times faster compared with the line-by-line model, allows for an improved estimation of the longwave radiative properties of the atmosphere in general circulation model simulations.
Article
This paper describes LFRic: the new weather and climate modelling system being developed by the UK Met Office to replace the existing Unified Model in preparation for exascale computing in the 2020s. LFRic uses the GungHo dynamical core and runs on a semi-structured cubed-sphere mesh. The design of the supporting infrastructure follows object-oriented principles to facilitate modularity and the use of external libraries where possible. In particular, a ‘separation of concerns’ between the science code and parallel code is imposed to promote performance portability. An application called PSyclone, developed at the STFC Hartree centre, can generate the parallel code enabling deployment of a single source science code onto different machine architectures. This paper provides an overview of the scientific requirement, the design of the software infrastructure, and examples of PSyclone usage. Preliminary performance results show strong scaling and an indication that hybrid MPI/OpenMP performs better than pure MPI.
Article
Data storage and data processing generate significant cost for weather and climate modeling centers. The volume of data that needs to be stored and data that are disseminated to end users increases with increasing model resolution and the use of larger forecast ensembles. If precision of data is reduced, cost can be reduced accordingly. In this paper, three new methods to allow a reduction in precision with minimal loss of information are suggested and tested. Two of these methods rely on the similarities between ensemble members in ensemble forecasts. Therefore, precision will be high at the beginning of forecasts when ensemble members are more similar, to provide sufficient distinction, and decrease with increasing ensemble spread. To keep precision high for predictable situations and low elsewhere appears to be a useful approach to optimize data storage in weather forecasts. All methods are tested with data of operational weather forecasts of the European Centre for Medium-Range Weather Forecasts.
Chapter
Performance analysis tools are essential in the process of understanding application behavior, identifying critical performance issues and adapting applications to new architectures and increasingly scaling HPC systems. State-of-the-art tools provide extensive functionality and a plenitude of specialized analysis capabilities. At the same time, the complexity of the potential performance issues and sometimes the tools themselves remains a challenging task, especially for non-experts. In particular, identifying the main issues in the overwhelming amount of data and tool opportunities as well as quantifying their impact and potential for improvement can be tedious and time consuming. In this paper we present a structured approach to performance analysis used within the EU Centre of Excellence for Performance Optimization and Productivity (POP). The structured approach features a method to get a general overview, determine the focus of the analysis, and identify the main issues and areas for potential improvement with a statistical performance model that leads to starting points for a subsequent in-depth analysis. All steps of the structured approach are accompanied with according tools from the BSC tool suite and underlined with an exemplary performance analysis.
Article
We present a roadmap towards exascale computing based on true application performance goals. It is based on two state-of-the art European numerical weather prediction models (IFS from ECMWF and COSMO from MeteoSwiss) and their current performance when run at very high spatial resolution on present-day supercomputers. We conclude that these models execute about 100-250 times too slow for operational throughput rates at a horizontal resolution of 1 km, even when executed on a full petascale system with nearly 5,000 state-of-the-art hybrid GPU-CPU nodes. Our analysis of the performance in terms of a metric that assesses the efficiency of memory use shows a path to improve the performance of hardware and software in order to meet operational requirements early next decade.
Conference Paper
In order to profit from emerging high-performance computing systems, weather and climate models need to be adapted to run efficiently on different hardware architectures such as accelerators. This is a major challenge for existing community models that represent very large code bases written in Fortran. We introduce the CLAW domain-specific language (CLAW DSL) and the CLAW Compiler that allows the retention of a single code written in Fortran and achieve a high degree of performance portability. Specifically, we present the Single Column Abstraction (SCA) of the CLAW DSL that is targeted at the column-based algorithmic motifs typically encountered in the physical parameterizations of weather and climate models. Starting from a serial and non-optimized source code, the CLAW Compiler applies transformations and optimizations for a specific target hardware architecture and generates parallel optimized Fortran code annotated with OpenMP or OpenACC directives. Results from a state-of-the-art radiative transfer code, indicate that using CLAW, the amount of source code can be significantly reduced while achieving efficient code for x86 multi-core CPUs and GPU accelerators. The CLAW DSL is a significant step towards performance portable climate and weather model and could be adopted incrementally in existing code with limited effort.
Article
The effect of preconditioning linear weighted least-squares using an approximation of the model matrix is analyzed. The aim is to investigate from a theoretical point of view the inefficiencies of this approach as observed in the application of the weakly-constrained 4D-Var algorithm in geosciences. Bounds on the eigenvalues of the preconditioned system matrix are provided. It highlights the interplay of the eigenstructures of both the model and weighting matrices: maintaining a low bound on the eigenvalues of the preconditioned system matrix requires an approximation error of the model matrix that compensates for the condition number of the weighting matrix. A low-dimension analytical example is given illustrating the resulting potential inefficiency of such preconditioners. Consequences of these results in the context of the state formulation of the weakly-constrained 4D-Var data assimilation problem are finally discussed. It is shown that the common approximations of the tangent linear model that maintain parallelization-in-time properties (identity or null matrix) can result in large bounds on the eigenvalues of the preconditioned matrix system.
Article
This paper discusses the practical use of the saddle variational formulation for the weakly-constrained 4D-VAR method in data assimilation. It is shown that the method, in its original form, may produce erratic results or diverge because of the inherent lack of monotonicity of the produced objective function values. Convergent, variationaly coherent variants of the algorithm are then proposed whose practical performance is compared to that of other formulations. This comparison is conducted on two data assimilation instances (Burgers equation and the Quasi-Geostrophic model), using two different assumptions on parallel computing environment. Because these variants essentially retain the parallelization advantages of the original proposal, they often --- but not always --- perform best, even for moderate numbers of computing processes.
Article
The current evolution of computer architectures towards increasing parallelism requires a corresponding evolution towards more parallel data assimilation algorithms. In this article, we consider parallelization of weak-constraint four-dimensional variational data assimilation (4D-Var) in the time dimension. We categorize algorithms according to whether or not they admit such parallelization and introduce a new, highly parallel weak-constraint 4D-Var algorithm based on a saddle-point representation of the underlying optimization problem. The potential benefits of the new saddle-point formulation are illustrated with a simple two-level quasi-geostrophic model.
Article
An advancement of the unstructured-mesh finite-volume MPDATA (Multidimensional Positive Definite Advection Transport Algorithm) is presented that formulates the error-compensative pseudo-velocity of the scheme to rely only on face-normal advective fluxes to the dual cells, in contrast to the full vector employed in previous implementations. This is essentially achieved by expressing the temporal truncation error underlying the pseudo-velocity in a form consistent with the flux-divergence of the governing conservation law. The development is especially important for integrating fluid dynamics equations on non-rectilinear meshes whenever face-normal advective mass fluxes are employed for transport compatible with mass continuity—the latter being essential for flux-form schemes. In particular, the proposed formulation enables large-time-step semi-implicit finite-volume integration of the compressible Euler equations using MPDATA on arbitrary hybrid computational meshes. Furthermore, it facilitates multiple error-compensative iterations of the finite-volume MPDATA and improved overall accuracy. The advancement combines straightforwardly with earlier developments, such as the nonoscillatory option, the infinite-gauge variant, and moving curvilinear meshes. A comprehensive description of the scheme is provided for a hybrid horizontally-unstructured vertically-structured computational mesh for efficient global atmospheric flow modelling. The proposed finite-volume MPDATA is verified using selected 3D global atmospheric benchmark simulations, representative of hydrostatic and non-hydrostatic flow regimes. Besides the added capabilities, the scheme retains fully the efficacy of established finite-volume MPDATA formulations.
Article
Earth's climate is a nonlinear dynamical system with scale-dependent Lyapunov exponents. As such, an important theoretical question for modeling weather and climate is how much real information is carried in a model's physical variables as a function of scale and variable type. Answering this question is of crucial practical importance given that the development of weather and climate models is strongly constrained by available supercomputer power. As a starting point for answering this question, the impact of limiting almost all real-number variables in the forecasting mode of ECMWF Integrated Forecast System (IFS) from 64 to 32 bits is investigated. Results for annual integrations and medium-range ensemble forecasts indicate no noticeable reduction in accuracy, and an average gain in computational efficiency by approximately 40%. This study provides the motivation for more scale-selective reductions in numerical precision.
Conference Paper
Many high-performance computing applications solving partial differential equations (PDEs) can be attributed to the class of kernels using stencils on structured grids. Due to the disparity between floating point operation throughput and main memory bandwidth these codes typically achieve only a low fraction of peak performance. Unfortunately, stencil computation optimization techniques are often hardware dependent and lead to a significant increase in code complexity. We present a domain-specific tool, STELLA, which eases the burden of the application developer by separating the architecture dependent implementation strategy from the user-code and is targeted at multi- and manycore processors. On the example of a numerical weather prediction and regional climate model (COSMO) we demonstrate the usefulness of STELLA for a real-world production code. The dynamical core based on STELLA achieves a speedup factor of 1.8x (CPU) and 5.8x (GPU) with respect to the legacy code while reducing the complexity of the user code.
Article
This paper provides a detailed theoretical analysis of methods to approximate the solutions of high-dimensional (>10^6) linear Bayesian problems. An optimal low-rank projection that maximizes the information content of the Bayesian inversion is proposed and efficiently constructed using a scalable randomized SVD algorithm. Useful optimality results are established for the associated posterior error covariance matrix and posterior mean approximations, which are further investigated in a numerical experiment consisting of a large-scale atmospheric tracer transport source-inversion problem. This method proves to be a robust and efficient approach to dimension reduction, as well as a natural framework to analyze the information content of the inversion. Possible extensions of this approach to the non-linear framework in the context of operational numerical weather forecast data assimilation systems based on the incremental 4D-Var technique are also discussed, and a detailed implementation of a new Randomized Incremental Optimal Technique (RIOT) for 4D-Var algorithms leveraging our theoretical results is proposed.
Article
We have developed a set of reduced, proxy applications (“MiniApps”) based on large-scale application codes supported at the Oak Ridge Leadership Computing Facility (OLCF). The MiniApps are designed to encapsulate the details of the most important (i.e. the most time-consuming and/or unique) facets of the applications that run in production mode on the OLCF. In each case, we have produced or plan to produce individual versions of the MiniApps using different specific programing models (e.g., OpenACC, CUDA, OpenMP). We describe some of our initial observations regarding these different implementations along with estimates of how closely the MiniApps track the actual performance characteristics (in particular, the overall scalability) of the large-scale applications from which they are derived.
Article
Hybrid variational-ensemble data assimilation (hybrid DA) is widely used in research and operational systems, and it is considered the current state of the art for the initialization of numerical weather prediction models. However, hybrid DA requires a separate ensemble DA to estimate the uncertainty in the deterministic variational DA, which can be suboptimal both technically and scientifically. A new framework called the ensemble-variational integrated localized (EVIL) data assimilation addresses this inconvenience by updating the ensemble analyses using information from the variational deterministic system. The goal of EVIL is to encompass and generalize existing ensemble Kalman filter methods in a variational framework. Particular attention is devoted to the affordability and efficiency of the algorithm in preparation for operational applications.
Article
The paper documents the development of a global nonhydrostatic finite-volume module designed to enhance an established spectral-transform based numerical weather prediction (NWP) model. The module adheres to NWP standards, with formulation of the governing equations based on the classical meteorological latitude-longitude spherical framework. In the horizontal, a bespoke unstructured mesh with finite-volumes built about the reduced Gaussian grid of the existing NWP model circumvents the notorious stiffness in the polar regions of the spherical framework. All dependent variables are co-located, accommodating both spectral-transform and grid-point solutions at the same physical locations. In the vertical, a uniform finite-difference discretisation facilitates the solution of intricate elliptic problems in thin spherical shells, while the pliancy of the physical vertical coordinate is delegated to generalised continuous transformations between computational and physical space. The newly developed module assumes the compressible Euler equations as default, but includes reduced soundproof PDEs as an option. Furthermore, it employs semi-implicit forward-in-time integrators of the governing PDE systems, akin to but more general than those used in the NWP model. The module shares the equal regions parallelisation scheme with the NWP model, with multiple layers of parallelism hybridising MPI tasks and OpenMP threads. The efficacy of the developed nonhydrostatic module is illustrated with benchmarks of idealised global weather.
Article
A coupled data assimilation system has been developed at the European Centre for Medium-Range Weather Forecasts (ECMWF), which is intended to be used for the production of global reanalyses of the recent climate. The system assimilates a wide variety of ocean and atmospheric observations and produces ocean-atmosphere analyses with a coupled model. Employing the coupled model constraint in the analysis implies that assimilation of an ocean observation has immediate impact on the atmospheric state estimate, and, conversely, assimilation of an atmospheric observation affects the ocean state. This covariance between atmosphere and ocean induced by the analysis method is illustrated with simple numerical experiments. Realistic data assimilation experiments based on the global observing system are then used to assess the quality of the assimilation method. Comparison with an uncoupled system shows overall a mostly neutral impact, with slightly improved temperature estimates in the upper ocean and the lower atmosphere. These preliminary results are considered of interest for the ongoing community efforts on coupled data assimilations.
Article
Writing efficient scientific software that makes best use of the increasing complexity of computer architectures requires bringing together modelling, applied mathematics and computer engineering. Physics may help unite these approaches.
Article
We present an adaptive discretization approach for model equations typical of numerical weather prediction, which combines the semi-Lagrangian technique with the TR-BDF2 semi-implicit time discretization method and with a DG spatial discretization with variable and adaptive element degree. The resulting method has full second order accuracy in time, can employ polynomial bases of arbitrarily high degree in space, is unconditionally stable and can effectively adapt at runtime the number of degrees of freedom employed in each element, in order to balance accuracy and computational cost. Furthermore, although the proposed method can be implemented on arbitrary unstructured and nonconforming meshes, even its application on simple Cartesian meshes in spherical coordinates can reduce the impact of the coordinate singularity, by reducing the polynomial degree used in the polar elements. Numerical results are presented, obtained on classical benchmarks with two-dimensional models implementing discretizations of the shallow water equations and of the Euler equations on a vertical slice, respectively. The results confirm that the proposed method has a significant potential for NWP applications.
Article
The manycore revolution can be characterized by increasing thread counts, decreasing memory per thread, and diversity of continually evolving manycore architectures. High performance computing (HPC) applications and libraries must exploit increasingly finer levels of parallelism within their codes to sustain scalability on these devices. A major obstacle to performance portability is the diverse and conflicting set of constraints on memory access patterns across devices. Contemporary portable programming models address manycore parallelism (e.g., OpenMP, OpenACC, OpenCL) but fail to address memory access patterns. The Kokkos C++ library enables applications and domain libraries to achieve performance portability on diverse manycore architectures by unifying abstractions for both fine-grain data parallelism and memory access patterns. In this paper we describe Kokkos’ abstractions, summarize its application programmer interface (API), present performance results for unit-test kernels and mini-applications, and outline an incremental strategy for migrating legacy C++ codes to Kokkos. The Kokkos library is under active research and development to incorporate capabilities from new generations of manycore architectures, and to address an growing list of applications and domain libraries.
Conference Paper
Image processing pipelines combine the challenges of stencil computations and stream programs. They are composed of large graphs of different stencil stages, as well as complex reductions, and stages with global or data-dependent access patterns. Because of their complex structure, the performance difference between a naive implementation of a pipeline and an optimized one is often an order of magnitude. Efficient implementations require optimization of both parallelism and locality, but due to the nature of stencils, there is a fundamental tension between parallelism, locality, and introducing redundant recomputation of shared values. We present a systematic model of the tradeoff space fundamental to stencil pipelines, a schedule representation which describes concrete points in this space for each stage in an image processing pipeline, and an optimizing compiler for the Halide image processing language that synthesizes high performance implementations from a Halide algorithm and a schedule. Combining this compiler with stochastic search over the space of schedules enables terse, composable programs to achieve state-of-the-art performance on a wide range of real image processing pipelines, and across different hardware architectures, including multicores with SIMD, and heterogeneous CPU+GPU execution. From simple Halide programs written in a few hours, we demonstrate performance up to 5x faster than hand-tuned C, intrinsics, and CUDA implementations optimized by experts over weeks or months, for image processing applications beyond the reach of past automatic compilers.
Conference Paper
Today the European Centre for Medium-Range Weather Forecasts (ECMWF) runs a 16 km global T1279 operational weather forecast model using 1,536 cores of an IBM Power7. Following the historical evolution in resolution upgrades, ECMWF could expect to be running a 2.5 km global forecast model by 2030 on an Exascale system that should be available and hopefully affordable by then. To achieve this would require IFS to run efficiently on about 1000 times the number of cores it uses today. This is a significant challenge, one that we are addressing within the CRESTA project. After implementing an initial set of improvements ECMWF has now demonstrated IFS running a 10 km global model efficiently on over 50,000 cores of HECToR, a Cray XE6 at EPCC, Edinburgh. Of course, getting to over a million cores remains a formidable challenge, and many scalability improvements have yet to be implemented. Within CRESTA, ECMWF is exploring the use of Fortran2008 coarrays; in particular it is possibly the first time that coarrays have been used in a world leading production application within the context of OpenMP parallel regions. The purpose of these optimizations is primarily to allow the overlap of computation and communication, and further, in the semi-Lagrangian advection scheme, to reduce the volume of data communicated. The importance of this research is such that if these developments are successful then the IFS model may continue to use the spectral method to 2030 and beyond on an Exascale sized system.
Article
A semi-implicit and semi-Lagrangian discontinuous Galerkin method for the shallow water equations is proposed, for applications to geophysical scale flows. A non conservative formulation of the advection equation is employed, in order to achieve a more treatable form of the linear system to be solved at each time step. The method is equipped with a simple p-adaptivity criterion, that allows to adjust dynamically the number of local degrees of freedom employed to the local structure of the solution. Numerical results show that the method captures well the main features of gravity and inertial gravity waves, as well as reproducing correct solutions in nonlinear test cases with analytic solutions. The accuracy and effectiveness of the method are also demonstrated by numerical results obtained at high Courant numbers and with automatic choice of the local approximation degree.
Article
We propose an easy-to-understand, visual performance model that offers insights to programmers and architects on improving parallel software and hardware for floating point computations.