PreprintPDF Available

Development of Atlas, a flexible data structure framework

August 2019

August 2019

Authors:

European Center For Medium Range Weather Forecasts

Preprints and early-stage research may not have been peer reviewed yet.

This document is one of the deliverable reports created for the ESCAPE project. ESCAPE stands for Energy-efficient Scalable Algorithms for Weather Prediction at Exascale. The project develops world-class, extreme-scale computing capabilities for European operational numerical weather prediction and future climate models. This is done by identifying Weather & Climate dwarfs which are key patterns in terms of computation and communication (in the spirit of the Berkeley dwarfs). These dwarfs are then optimised for different hardware architectures (single and multi-node) and alternative algorithms are explored. Performance portability is addressed through the use of domain specific languages. In this deliverable report, we present Atlas, a new software library that is currently being developed at the European Centre for Medium-Range Weather Forecasts (ECMWF), with the scope of handling data structures required for NWP applications in a flexible and massively parallel way. Atlas provides a versatile framework for the future development of efficient NWP and climate applications on emerging HPC architectures. The applications range from full Earth system models, to specific tools required for post-processing weather forecast products. Atlas provides data structures for building various numerical strategies to solve equations on the sphere or limited area's on the sphere. These data structures may contain a distribution of points (grid) and, possibly, a composition of elements (mesh), required to implement the numerical operations required. Atlas can also represent a given field within a specific spatial projection. Atlas is capable of mapping fields between different grids as part of pre- and post-processing stages or as part of coupling processes whose respective fields are discretised on different grids or meshes.

The conceptual design of Atlas.

…

Workflow of Atlas starting from Grid to the creation of a Field, discretised on a Mesh and managed by a FunctionSpace.

…

showcases 4 example grids that can be created or used with Atlas.

…

Four examples of global grids in geographical coordinates with approximately similar resolution in the equatorial region.

…

+16

UML class diagram for the Domain class

…

Figures - uploaded by Willem Deconinck

Content may be subject to copyright.

Content uploaded by Willem Deconinck

Content may be subject to copyright.

D1.3%

Development%of%

Atlas,%a%flexible%

data%structure%

framework!

This!project!has!received! funding! from!the!European!

Union’s! Horizon! 2020! research! and! innovation!

programme!under!grant!agreement!No!67162!

Dissemination!Level:!public!

Preparing%Forecasting%Systems%for%the%

Next%generation%of%Supercomputers%

Co-ordinated by

Funded by the

European Union

ECMWF

Ref. Ares(2017)1723184 - 31/03/2017

"#$#%&'(!%)*!+)),-%./,)!0'./,)!

123234567189423:;!

8&,<#'.!9,,&*/)%.,&=!>&?!8#.#&!@%A#&!B69CD5E!!

8&,<#'.!F.%&.!>%.#=!3:G:3G23:H!

8&,<#'.!>A&%./,)=!IJ!K,).(!

8ALM/$(#*!LN!.(#!6F9086!9,)$,&./AK!

O#&$/,)=!:?3!

9,).&%'.A%M!>#M/-#&N!>%.#=!I:G3IG23:P!

D,&Q!8%'Q%R#G!7%$Q=!D8:!G!7:?2!

>,'AK#).!ST)#&=!69CD5!

9,).&/LA.,&$=!69CD5U!"C+U!C#.#,FT/$$!

F.%.A$=!5/)%M!

Energy'efficient,Scalable,Algorithms,

for,Weather,Prediction,at,Exascale,

0A.(,&,Willem,Deconinck,

>%.#,31/03/2017

D1.3 – Development of Atlas, a ﬂexible data structure framework

Contents

1 Executive Summary ......................... 1

2 Introduction ............................. 4

2.1 Background ......................... 4

2.2 Scope of this deliverable .................. 4

2.2.1 Objectives of this deliverable .......... 4

2.2.2 Work performed on this deliverable ....... 5

2.2.3 Deviations and counter measures ........ 6

3 Getting started with Atlas ..................... 7

3.1 System requirements .................... 7

3.2 Downloading Atlas ..................... 8

3.3 Compilation and Installation of Atlas ........... 9

3.4 Inspecting your Atlas installation ............. 11

3.5 Using Atlas in your project ................. 12

4 Atlas design and implementation ................. 16

4.1 Programming languages .................. 16

4.2 Grid ............................. 17

4.2.1 Projection ..................... 18

4.2.2 Domain ...................... 19

4.2.3 Supported Grid types .............. 20

4.2.3.1 UnstructuredGrid ............ 24

4.2.3.2 StructuredGrid ............. 24

4.2.3.3 RegularGrid ............... 25

4.2.3.4 ReducedGrid .............. 26

4.2.3.5 GaussianGrid .............. 26

4.2.3.6 RegularGaussianGrid .......... 27

4.2.3.7 ReducedGaussianGrid ......... 28

4.2.3.8 RegularLonLatGrid ........... 29

4.2.3.9 RegularPeriodicGrid .......... 31

4.2.3.10 RegularRegionalGrid .......... 31

4.2.4 Partitioner .................... 32

4.2.4.1 Checkerboard Partitioner ....... 33

4.2.4.2 EqualRegions Partitioner ....... 33

4.2.4.3 MatchingMesh Partitioner ....... 34

4.3 Mesh ............................. 34

4.4 Parallelisation ........................ 38

4.5 FunctionSpace ........................ 40

4.6 Field ............................. 43

4.7 Mathematical Operations ................. 44

5 Accelerator Support ......................... 46

5.1 GridTools storage layer ................... 47

iii

D1.3 – Development of Atlas, a ﬂexible data structure framework

5.2 Atlas and GridTools storage integration .......... 48

5.3 Fortran ﬁelds on accelerators ................ 51

6 Conclusions ............................. 52

7 References .............................. 53

D1.3 – Development of Atlas, a ﬂexible data structure framework

1 Executive Summary

The algorithms underlying numerical weather prediction (NWP) and climate models

that have been developed in the past few decades face an increasing challenge caused

by the paradigm shift imposed by hardware vendors towards more energy-eﬃcient

devices. This is because the Dennard scaling (constant power consumption with

increasing transistor density) has ended for traditional CPU cores. Rather than

increasing clock speeds of the chips, performance is increased by adding more chips,

and increasing parallelism. In order to provide a sustainable path to exascale High

Performance Computing (HPC), applications become increasingly restricted by

energy consumption. As a result, the emerging diverse and complex hardware

solutions have a large impact on the programming models traditionally used in

NWP software, triggering a rethink of design choices for future massively parallel

software frameworks. In this deliverable report, we present Atlas, a new software

library that is currently being developed at the European Centre for Medium-Range

Weather Forecasts (ECMWF), with the scope of handling data structures required

for NWP applications in a ﬂexible and massively parallel way. Atlas provides

a versatile framework for the future development of eﬃcient NWP and climate

applications on emerging HPC architectures. The applications range from full

Earth system models, to speciﬁc tools required for post-processing weather forecast

products. The Atlas library thus constitutes a step towards aﬀordable exascale

high-performance simulations by providing the necessary abstractions that facilitate

the application in heterogeneous HPC environments by promoting the co-design of

NWP algorithms with the underlying hardware.

Atlas provides data structures for building various numerical strategies to solve

equations on the sphere or limited area’s on the sphere. These data structures

may contain a distribution of points (grid) and, possibly, a composition of elements

(mesh), required to implement the numerical operations required. Atlas can also

represent a given ﬁeld within a speciﬁc spatial projection. Atlas is capable of

mapping ﬁelds between diﬀerent grids as part of pre- and post-processing stages or

as part of coupling processes whose respective ﬁelds are discretised on diﬀerent grids

or meshes. The latter is particularly relevant for the physical parametrisations,

where some physical processes such as radiation may be represented on a coarser

grid or mesh and may need to be projected onto a ﬁner grid or mesh.

The key concepts in the design of the Atlas data structure are:

•Grid: ordered list of points (coordinates) without connectivity rules;

•

Mesh: collection of elements linking the grid points by speciﬁc connectivity

rules;

D1.3 – Development of Atlas, a ﬂexible data structure framework

•Field: array of discrete values representing a given quantity;

•FunctionSpace: discretisation space in which a ﬁeld is deﬁned.

These concepts are depicted in Figure 1, where we used the sphere to represent a

global grid, mesh and ﬁeld. A grid is merely a predeﬁned list of two-dimensional

Finite Volume Continuous

Spectral Element

Discontinuous

Spectral Element

Spectral

Transform

Grid Mesh (distributed)

FunctionSpace

Field (distributed)

interpreted by

FunctionSpace

Figure 1: The conceptual design of Atlas.

points, typically structured and using two indices

and

so that point coordinates

and computational stencils (for e.g. derivatives) are easily retrieved without connec-

tivity rules. For models using a structured grid point approach a grid is enough to

deﬁne ﬁelds with appropriate indexing mechanisms. For element-based numerical

methods (generally unstructured) however, the mesh concept is introduced that

describes connectivity lists linking elements, edges and nodes.

Amesh may be decomposed in partitions and distributed among MPI tasks. Every

MPI task then allows computations on one such partition. Overlap regions (or

halo’s) between partitions can be constructed to enable stencil operations in a

parallel context.

In addition to these two features, it is necessary to introduce the concept of ﬁeld,

intended as a container of values of a given variable. A ﬁeld can be discretised in

various ways. The concept responsible to interpret/provide the discretisation of a

ﬁeld in terms of spatial projection (e.g. grid-points, mesh-nodes, mesh-cell-centres)

or spectral coeﬃcients is the function space. The function space also implements

parallel communication operations responsible for performing synchronisation of

ﬁelds across overlap regions, which we refer to as halo-exchange hereafter.

D1.3 – Development of Atlas, a ﬂexible data structure framework

A possible Atlas workﬂow consisting of the creation and discretisation of a ﬁeld, is

illustrated in Figure 2, where we also emphasise some additional characteristics

of each step. The building blocks illustrated in Figure 2can then be used to

MeshGenerator FunctionSpaceMesh Fields

- Memory layout

- Parallelisation

- Hardware

Grid

- Distributed

- Unstructured

- Halo

- Parallel- Point locations

and ordering

- Memory

- Metadata

Figure 2: Workﬂow of Atlas starting from Grid to the creation of a Field, discretised

on a Mesh and managed by a FunctionSpace.

implement additional operations required for speciﬁc applications. Atlas supplies

certain mathematical operations as ready solutions to be plugged in to user software.

These operations vary from the computation of gradient, divergence, curl and

laplacian operations to remapping or interpolation of ﬁelds deﬁned on diﬀerent

grids.

D1.3 – Development of Atlas, a ﬂexible data structure framework

2 Introduction

2.1 Background

ESCAPE stands for Energy-eﬃcient Scalable Algorithms for Weather Prediction

at Exascale. The project will develop world-class, extreme-scale computing capa-

bilities for European operational numerical weather prediction and future climate

models. ESCAPE addresses the ETP4HPC Strategic Research Agenda ‘Energy and

resiliency’ priority topic, developing a holistic understanding of energy-eﬃciency

for extreme-scale applications using heterogeneous architectures, accelerators and

special compute units by:

•

Deﬁning and encapsulating the fundamental algorithmic building blocks

underlying weather and climate computing;

•

Combining cutting-edge research on algorithm development for use in extreme-

scale, high-performance computing applications, minimising time- and cost-

to-solution;

•

Synthesising the complementary skills of leading weather forecasting consortia,

university research, high-performance computing centres, and innovative

hardware companies.

ESCAPE is funded by the European Commission’s Horizon 2020 funding framework

under the Future and Emerging Technologies - High-Performance Computing call

for research and innovation actions issued in 2014.

2.2 Scope of this deliverable

2.2.1 Objectives of this deliverable

The Atlas library is a software library being developed at ECMWF in the context

of its Scalability Programme. As such, at the initiation of ESCAPE, the library

was already in a functional state to support the development of the existing dwarfs

(see Deliverable D1.1 [1]). It was however still in an early development stage.

This deliverable aims at providing an established ﬁrst oﬃcial and documented

release of the Atlas library. This release is intended only to be used by ESCAPE

partners, with the aim to provide a new stable version of the Atlas library to

improve ESCAPE Weather and Climate Dwarfs (see Deliverable D1.1 [1], and

develop new dwarfs (see Deliverable D1.2 [2]).

D1.3 – Development of Atlas, a ﬂexible data structure framework

Most available dwarfs delivered in Deliverable D1.2 embody algorithms deﬁned

using domains that span the entire globe. ESCAPE however requires application

of these dwarfs to non-global or regional domains. The delivered Atlas release

therefore also includes new capabilities to accomodate algorithms on regional grids,

which have been established in Deliverable D4.4 [3].

Further ESCAPE developments also include the application of a Domain Speciﬁc

Language (DSL) to several dwarfs. The DSL can have diﬀerent backends, each

capable of executing to execute algorithms on diﬀerent HPC hardware architectures

(CPU, GPU, MIC). Especially GPU architectures are very diﬀerent in nature and

algorithms may require copying data back and forth from a host architecture (CPU)

to a device (GPU) where computations on the data are performed (see Deliverable

D2.4 [4]). The delivered Atlas release therefore also includes a new advanced

data-storage facility that accomodates host-device synchronisation capabilities with

diﬀerent backends. In practice the GPU backend is currently implemented only for

GPU’s programmable with the CUDA language (NVIDIA) [5].

2.2.2 Work performed on this deliverable

As suggested in Section 2.2.1, the Atlas library was in an early development stage

at ESCAPE’s initiation. The majority of the work performed during between

ESCAPE’s initiation and the delivery date has been to design and implement new

capabilities as well as redesign and reimplement existing capabilities to accomo-

date new or evolving requirements. Existing capabilities have been redesigned

to make the library easier to use. Other capabilities have been removed and

were implemented instead in other more general support libraries eckit,fckit (see

Section 3).

As part of this deliverable, the library Atlas has been made to succesfully compile

using compiler suites GNU, Intel, Cray and PGI. More speciﬁcally, compiling

the modern Fortran 2008 interfaces using the PGI compiler suite proved to be

not straightforward due to existing compiler bugs. Workarounds in Atlas were

implemented that allowed PGI’s Fortran compiler to compile all of Atlas capabilities

succesfully. The compiler bugs have been reported to PGI and will be ﬁxed in the

upcoming PGI release. Contacts through ESCAPE partner NVIDIA have sped up

this process signiﬁcantly.

ECMWF and ESCAPE partner MeteoSwiss have collaborated to devise a strategy

to accomodate the use of Atlas as a storage backend for unstructured meshes in

the GridTools DSL developed at MeteoSwiss, which will be required for ESCAPE

deliverable D2.4 [4]. The GridTools library[6] provides a domain speciﬁc language

(DSL) that allows to write numerical operators generated from discretisations

D1.3 – Development of Atlas, a ﬂexible data structure framework

in a performance portable way, abstracting details of the implementation and

optimizations speciﬁc to hardware architecture. The ESCAPE deliverable D2.4 will

extent the DSL to support unstructured meshes. In order to enable the use of the

Atlas unstructured meshes by the DSL, the Atlas data structures have been extended

to support GPU accelerators. To further enhance the interoperability between Atlas

and the DSL, the GPU support has been implemented using the GridTools storages

framework. Additionally the work performed to support Atlas unstructured meshes

on GPUs allows to port Fortran dwarfs to GPU using OpenACC.

At the onset of the ESCAPE project, Atlas supported mesh generation capabilities

for global grids covering the sphere. However as Atlas is targeted to be used also

in regional NWP models, these capabilites required to be extended for regional

grids. This work was mainly done in ESCAPE deliverable D4.4 [3]. Further work

on this subject during this deliverable involved consolidating the work performed

in deliverable D4.4 and redesigning further features that this major work required.

2.2.3 Deviations and counter measures

Even though Atlas now fully supports mesh generation for regional grids as required

by the majority of limited-area models, there is more work that can be done to

support other aspects in Atlas such as mathematical operators (gradient, divergence,

curl) taking into account the used projections of a regional grid. Although this

work is ongoing during the course of the ESCAPE project, it is not foreseen as

a critical requirement at this moment to develop algorithms relying on Atlas for

limited area modelling purposes.

With this deliverable, Atlas has been made accelerator aware in terms of data

structure. It was envisioned to support also parallel communication operations

between accelerators (e.g. GPU’s), eﬀectively bypassing the host (CPU). An example

would be the support of halo-exchanges between mesh partitions. This support

can be seen as an optimisation rather than an obstacle in the development of

accelerator aware algorithms relying on Atlas. It is therefore not critical for this

deliverable.

A new stable Atlas release will be delivered for ESCAPE at the end of 2017, which

will address these issues. To keep track of the remaining work, it has been added as

JIRA tasks in the ESCAPE software collaboration platform [7]. ESCAPE partners

will be kept up to date as new features become available in the mean time. This

strategy has shown to work eﬀectively over the course of ESCAPE so far.

D1.3 – Development of Atlas, a ﬂexible data structure framework

3 Getting started with Atlas

This section is intended to be a general introduction on how to download, install

and run Atlas. In particular, in section 3.1 we will present the general system

requirements before building the library. Section section 3.2 details how to download

Atlas and its internal dependencies. In section 3.3 we will ﬁrst describe how to

install the internal dependencies required by Atlas (if supported by ECMWF) and

successively we will outline how to install Atlas. Section section 3.4 then explains

how to check the installation. Finally, in section 3.5 we show how to incorporate

Atlas in your own software by creating a simple example that initialises and ﬁnalises

the library.

3.1 System requirements

The system requirements for Atlas can be summarised as follows:

•POSIX

: The operating system must be POSIX compliant. Currently this

limits the use to UNIX, Linux, and MacOSX operating systems.

•C++ 11, Fortran 2008

(optional) : Atlas uses the programming languages

C++

and optionally Fortran. The required standards for these languages are

respectively C++ 11 and Fortran 2008.

•OpenMP

for

C++

(optional): In order for Atlas to optionally be able to

take advantage of OpenMP multi-threading, the

C++

compiler is required to

support OpenMP version 3.

•MPI

for C (optional): To use Atlas in a distributed memory application, the

system needs to have the MPI libraries for the C-language available.

•Git

: Required for project management and to download Atlas. For use and

installation see https://git-scm.com/

•CMake

: The compilation or build system of Atlas is based on CMake 3.3 or

higher, which is required to be present on the system. For use and installation

see http://www.cmake.org/ .

•Python

: Required for certain components of the build system. For use and

installation see

https://www.python.org/

. (Known to work with version

2.7.12)

•Boost

(optional): The Atlas installation process can optionally compile

unit-tests to check if Atlas is correctly installed. To compile these optional

D1.3 – Development of Atlas, a ﬂexible data structure framework

unit-tests, the Boost

C++

library is required to be present on the system.

For use and installation see

http://www.boost.org/

. (Known to work with

boost 1.61.0)

•CUDA

(optional): Atlas can also optionally make use of the GridTools

storage layer to support use on accelerator hardware. A requirement here

is also the Boost

C++

library. When intended for a GPU accelerator, an

additional requirement is also that CUDA 6.0 or greater be installed on the

system.

•FFTW

(optional): Atlas can optionally perform spectral transform opera-

tions, which in the most general case require that FFTW be present on the

system.

3.2 Downloading Atlas

Apart from the system requirements outlined in section 3.1,Atlas has a number of

internal dependencies that are not all publicly available or require modiﬁcations

for ESCAPE:

•ecbuild: It implements some CMake macros that are useful for conﬁguring

and compiling Atlas and the other internal dependencies required by Atlas.

For further information, please visit:

https://software.ecmwf.int/wiki/

display/ECBUILD/ecBuild.

•eckit

: It implements some useful

C++

functionalities widely used in ECMWF

C++

projects. For further information, please visit:

https://software.

ecmwf.int/wiki/display/ECKIT/ecKit

•fckit (optional): It implements some useful Fortran functionalities.

•trans, transi

(optional): The trans library implements spectral transform

methods (in Fortran), and transi exposes these methods to be used in C/

C++

•gridtools_storage

(optional): It implements accelerator-aware data struc-

tures.

Atlas and the listed internal dependencies are distributed as Git repositories and

are available at ECMWF’s Bitbucket git hosting service for ESCAPE:

https:

//software.ecmwf.int/stash/projects/ESCAPE

. The versions for Atlas and its

internal dependencies are released for this deliverable and tagged in their respective

Git repositories with the Git tag “escape/D1.3”. Access to this service is currently

D1.3 – Development of Atlas, a ﬂexible data structure framework

restricted to ESCAPE partners only. A public access version is to be released

with ESCAPE deliverable D2.3 (31 December 2017), including all its dependencies,

excluding the optional trans and transi project.

To download Atlas and its internal dependencies, following instructions are to be

used on the command line:

export ESCAPE=https://software.ecmwf.int/stash/scm/escape

export SRC=$(pwd)/source

mkdir -p ${SRC}

cd ${SRC}

git clone -b escape/D1.3 ${ESCAPE}/ecbuild

git clone -b escape/D1.3 ${ESCAPE}/eckit

git clone -b escape/D1.3 ${ESCAPE}/fckit

git clone -b escape/D1.3 ${ESCAPE}/trans

git clone -b escape/D1.3 ${ESCAPE}/transi

git clone -b escape/D1.3 ${ESCAPE}/gridtools_storage

git clone -b escape/D1.3 ${ESCAPE}/atlas

3.3 Compilation and Installation of Atlas

In the following we will outline how to build and install Atlas and each of the

projects Atlas depends on that are not covered by the system requirements. The

ﬁrst step is to create a folder where to build and install each project, and to

choose a compilation optimisation level. The following three optimisation levels

are recommended:

•DEBUG

: No optimisation - used for debugging or development purposes only.

This option may enable additional bounds checking.

•BIT : Maximum optimisation while remaining bit-reproducible.

•RELEASE

: Maximum optimisation. For some algorithms and using some

compilers, too agressive optimisation can lead to wrong results.

export BUILD=$(pwd)/build

export INSTALL=$(pwd)/install

export BUILD_TYPE=BIT

export PATH=${PATH}:${SRC}/ecbuild/bin

mkdir -p ${BUILD}/eckit; cd ${BUILD}/eckit

ecbuild --build=${BUILD_TYPE} --prefix=${INSTALL}/eckit -- ${SRC}/eckit

make -j8 install

D1.3 – Development of Atlas, a ﬂexible data structure framework

mkdir -p ${BUILD}/fckit; cd ${BUILD}/fckit

ecbuild --build=${BUILD_TYPE} --prefix=${INSTALL}/fckit -- \

-DFCKIT_PATH=${INSTALL}/fckit \

${SRC}/fckit

make -j8 install

mkdir -p ${BUILD}/trans; cd ${BUILD}/trans

ecbuild --build=${BUILD_TYPE} --prefix=${INSTALL}/trans -- \

${SRC}/trans

make -j8 install

mkdir -p ${BUILD}/transi; cd ${BUILD}/transi

ecbuild --build=${BUILD_TYPE} --prefix=${INSTALL}/transi -- \

-DENABLE_ESCAPE=ON \

-DTRANS_PATH=${INSTALL}/trans \

${SRC}/fckit

make -j8 install

mkdir -p ${BUILD}/gridtools_storage; cd ${BUILD}/gridtools_storage

ecbuild --prefix=${INSTALL}/gridtools_storage -- \

${SRC}/gridtools_storage

make -j8 install

mkdir -p ${BUILD}/atlas; cd ${BUILD}/atlas

ecbuild --build=${BUILD_TYPE} --prefix=${INSTALL}/atlas -- \

-DECKIT_PATH=${INSTALL}/eckit \

-DFCKIT_PATH=${INSTALL}/fckit \

-DTRANSI_PATH=${INSTALL}/transi \

-DGRIDTOOLS_STORAGE_PATH=${INSTALL}/gridtools_storage \

${SRC}/atlas

The following extra ﬂags may be added to Atlas conﬁguration step to ﬁne-tune

features

•-DENABLE_OMP=OFF — Enable/Disable OpenMP

•-DENABLE_FORTRAN=OFF — Disable Compilation of Fortran bindings

•-DENABLE_TRANS=OFF

— Disable compilation of the spectral transforms

functionality. This is automatically disabled if the optional transi dependency

is not compiled or found. In this case it is also unnecessary to provide

-DTRANSI_PATH=$INSTALL/transi .

•-DENABLE_GRIDTOOLS_STORAGE=OFF

— Disable gridtools_storage, and en-

able instead an internal data-storage solution.

D1.3 – Development of Atlas, a ﬂexible data structure framework

•-DENABLE_GPU=ON — Enable GPU backend for gridtools_storage.

•-DENABLE_BOUNDSCHECKING=ON

— Enable boundschecking in

C++

code when

indexing arrays. By default BOUNDSCHECKING is ON when the build-type

is DEBUG, otherwise the default is OFF.

Note

By default compilation is done using shared libraries. Some systems

have linking problems with static libraries that have not been com-

piled with the ﬂag

-fPIC

. In this case, also compile Atlas using

static linking, by adding to the ecbuild step for each project the ﬂag:

--static

Note

The build system for the entire software stack presented above is based

on ecbuild which facilitates portability across multiple platforms.

However some platforms (like ECMWF’s HPC) may have a non-

standard conﬁguration (in terms of CMake). For these cases ecbuild

has a toolchain option, which allows you to provide a custom set of

rules for a speciﬁc platform. The reader is referred to the ecbuild

documentation, and the ecbuild “help” : ecbuild --help

The building and installation of Atlas should now be complete and you can start

using it. With this purpose, in the next section we show a simple example on how

to create a simple program to initialise and ﬁnalise the library.

3.4 Inspecting your Atlas installation

Once installation of Atlas is complete, an executable called “atlas” can be found in

${INSTALL}/bin/atlas . Example use is listed:

>>> ${INSTALL}/bin/atlas --version

0.10.0

>>> ${INSTALL}/bin/atlas --git

escape/D1.3

>>> ${INSTALL}/bin/atlas --info

atlas version (0.10.0), git (escape/D1.3)

D1.3 – Development of Atlas, a ﬂexible data structure framework

Build:

build type : Release

timestamp : 20160215122606

op. system : Darwin-14.5.0 (macosx.64)

processor : x86_64

c compiler : Clang 7.0.2.7000181

flags : -O3 -DNDEBUG

c++ compiler : Clang 7.0.2.7000181

flags : -O3 -DNDEBUG

fortran compiler: GNU 5.2.0

flags : -fno-openmp -O3 -funroll-all-loops -finline-functions

Features:

Fortran : ON

MPI : ON

OpenMP : OFF

BoundsChecking : OFF

ArrayDataStore : GridTools

GPU : OFF

Trans : ON

Tesselation : ON

gidx_t : 64 bit integer

Dependencies:

eckit version (0.12.3), git (escape/D1.3)

fckit version (0.3.1), git (escape/D1.3)

transi version (0.3.2), git (escape/D1.3)

This executable gives you information respectively on the version, a more detailed

git-version-controlled identiﬁer, and ﬁnally a more complete view on all the features

that Atlas has been compiled with, as well as compiler and compile ﬂag information.

Also printed are the versions of used dependencies such as eckit and transi.

3.5 Using Atlas in your project

In this section, we provide a simple example on how to link Atlas in your own

software. We will show a simple “Hello world” program that initialises and ﬁnalises

the library, and uses the internal Atlas logging facilities to print “Hello world!”.

Note that Atlas supports both

C++

and Fortran. Therefore, we will show equivalent

examples using both C++ and Fortran.

D1.3 – Development of Atlas, a ﬂexible data structure framework

1// f il e : h el lo - w o rl d . cc

3#include " a tl a s / li b ra ry / L i br a ry . h "

4#include " a t la s / r u nt i me / L o g .h "

6in t main (int argc ,char** argv )

8a tl as : : Library: : instance() . i ni t ia l is e (argc ,argv) ;

9a tl as : : Log: : info() < < " H el l o w or l d !" << std: : endl;

10 a tl as : : Library: : instance() . finalise( ) ;

12 return 0;

13 }

Listing 1: Using Atlas in a C++ project

1! f il e : he l lo - w o rl d . F9 0

3program hello_world

5us e atlas_module ,only :atlas_library ,a t l a s _ log

7call a tl a s_ li b ra r y %in i ti al is e ()

8call a tl a s_ lo g %info(" H e ll o w o rl d ! ")

9call a tl a s_ li b ra r y %finalise()

11 en d p r o g r a m

Listing 2: Using Atlas in a Fortran project

First, the Atlas library is initialised. In

C++

this function requires two arguments

argc

and

argv

from the command-line. In Fortran these arguments are automati-

cally provided by the Fortran runtime environment. This function is used to set up

the logging facility and for the initialisation of MPI (Message Passage Interface).

Following initialisation, we log “Hello world!” to the

info

channel. Atlas provides

4 diﬀerent log channels which can be conﬁgured separately:

debug

info

warning

and

error

. By default all log channels print to the std::cout stream, and the

debug channel can be switched on or oﬀ by setting the environment variable

ATLAS_DEBUG=1

ATLAS_DEBUG=0

. Not specifying

ATLAS_DEBUG

is treated as

ATLAS_DEBUG=0 . Finally we end the program after ﬁnalising the Atlas library.

D1.3 – Development of Atlas, a ﬂexible data structure framework

Note

The logging facility exposed by Atlas is implemented by eckit. The

Fortran interface is using fckit, which also delegates its implementation

to eckit. For this reason, logging through

C++

or Fortran shares the

same infrastructure, which ensures that the logging is consistent in

mixed C++/Fortran codes.

Standard code compilation

Compiling the C++ example with the GNU C++ compiler:

g++ hello-world.cc -o hello-world \

$(pkg-config ${INSTALL}/atlas/lib/pkgconfig/atlas.pc --libs --cflags)

Compiling the Fortran example with the GNU Fortran compiler:

gfortran hello-world.F90 -o hello-world \

$(pkg-config ${INSTALL}/atlas/lib/pkgconfig/atlas.pc --libs --cflags)

We can now run the executable:

>>> ./hello-world

Hello world!

We can run the same executable with debug output printed during Atlas initialisa-

tion:

>>> ATLAS_DEBUG=1 ./hello-world

The output now shows in addition to

Hello world!

also some information such

as the version of Atlas we are running, the identiﬁer of the commit and the path of

the executable, similarly to the output of atlas - -info in Section 3.4.

Code compilation using ecbuild

As Atlas is a ecbuild (CMake) project, it integrates easily in other ecbuild (CMake)

projects. Two sample ecbuild projects are shown here that compile the “hello-world”

example code, for respectively the C++ and the Fortran version.

D1.3 – Development of Atlas, a ﬂexible data structure framework

An example C++ ecbuild project would look like this:

1# File: CMakeLists.txt

2cmake_minimum_required(VERSION 3.3.2 FATAL_ERROR)

3project(hello_world CXX)

5include(ecbuild_system NO_POLICY_SCOPE)

6ecbuild_requires_macro_version(2.6)

7ecbuild_declare_project()

8ecbuild_use_package(PROJECT atlas REQUIRED)

9ecbuild_add_executable(TARGET hello-world

10 SOURCES hello-world.cc

11 INCLUDES ${ATLAS_INCLUDE_DIRS}

12 LIBS atlas)

13 ecbuild_print_summary()

An example Fortran ecbuild project would look like this:

1# File: CMakeLists.txt

2cmake_minimum_required(VERSION 2.8.4 FATAL_ERROR)

3project(hello_world Fortran)

5include(ecbuild_system NO_POLICY_SCOPE)

6ecbuild_requires_macro_version(1.9)

7ecbuild_declare_project()

8ecbuild_enable_fortran(MODULE_DIRECTORY ${CMAKE_BINARY_DIR}/module

9REQUIRED)

10 ecbuild_use_package(PROJECT atlas REQUIRED)

11 ecbuild_add_executable(TARGET hello-world

12 SOURCES hello-world.F90

13 INCLUDES ${ATLAS_INCLUDE_DIRS}

14 ${CMAKE_CURRENT_BINARY_DIR}

15 LIBS atlas_f)

16 ecbuild_print_summary()

To compile the ecbuild project, you have to ﬁrst create an out-of-source build

directory, and point ecbuild to the directory where the CMakeLists.txt is located.

mkdir -p build; cd build

ecbuild -DATLAS_PATH=${INSTALL}/atlas ../

make

Note that in the above command we needed to provide the path to the Atlas

library installation. Alternatively,

ATLAS_PATH

may be deﬁned as an environment

variable. This completes the compilation of our ﬁrst example that uses Atlas and

generates an executable into the bin folder (automatically generated by CMake)

inside our builds directory. For more information on using ecbuild, or CMake, see

https://software.ecmwf.int/wiki/display/ECBUILD/ecBuild.

This completes your ﬁrst project that uses the Atlas library.

D1.3 – Development of Atlas, a ﬂexible data structure framework

4 Atlas design and implementation

This section discusses the design of the most important Atlas concepts, and to a

certain level their implementation details. Implementation details are aided by

diagrams formulated in the Uniﬁed Modelling Language (UML) [8].

4.1 Programming languages

Atlas is primarily written in the

C++

programming language. The

C++

programming

language facilitates OO design, and is high performance computing capable. The

latter is due to the support

C++

brings for hardware speciﬁc instructions. In

addition, the high compatibility of

C++

with C allows Atlas to make use of speciﬁc

programming models such as CUDA to support GPU’s, and facilitates the creation

of C-Fortran bindings to create generic Fortran interfaces.

With much of the NWP operational software written in Fortran, signiﬁcant eﬀort in

the Atlas design has been devoted to having a Fortran OO Application Programming

Interface (API) wrapping the C++ concepts as closely as possible.

The Fortran API mirrors the

C++

classes with a Fortran derived type, whose only

data member is a raw pointer to an instance of the matching

C++

class. The

Fortran derived type also contains member functions or subroutines that delegate

its implementation to matching member functions of the

C++

class instance. Since

Fortran does not directly interoperate with

C++

, C interfaces to the

C++

class

member functions are created ﬁrst, and it is these interfaces that the Fortran

derived type delegates to. The whole interaction procedure is schematically shown

in Figure 3. The overhead created by delegating function calls from the Fortran API

public:

method()

private:

data

C++ Object

C interface Fortran-C bindings

public:

method()

private:

C_PTR object

Fortran Object

Figure 3: Procedure how the Fortran interface to the

C++

design is constructed.

When a method in the Fortran object is called, it will actually be executed by the

instance of its matching C++ class, through a C interface.

to a

C++

implementation can be disregarded if performed outside of a computational

loop. Atlas is primarily used to manage the data structure in a OO manner, and the

D1.3 – Development of Atlas, a ﬂexible data structure framework

actual ﬁeld data should be accessed from the data structure before a computational

loop starts.

4.2 Grid

In the NWP and climate modelling community (as opposed to, for instance, the

engineering community) the grid is often a ﬁxed property for a model. One of Atlas’

goals is to provide a catalogue of a variety of global and regional grids deﬁned by

the World Meteorological Organisation in order to support multiple models and

model inter-comparison initiatives.

There exist three main categories of grids in terms of functionality that Atlas can

currently represent: unstructured grids, regular grids, and reduced grids.

Unstructured grids describe an arbitrary number of points in no particular

order. The

- and

-coordinates of the points cannot be computed with certain

mathematical formulations, and thus have to be speciﬁed individually for each

point (e.g. Figure 4a).

Regular grids on the other hand make the assumption that points are aligned

in both

- and

-direction (e.g. Figure 4c). Grid point coordinates can then be

derived by two independent indices (

) associated to the

- and

- direction,

respectively.

For reduced grids, lines of constant

or so called parallels may however have

a diﬀerent amount of gridpoints along the

-direction (Figure 4b and Figure 4d).

Reduced grids are a common type of grid employed in global weather and climate

models to reduce the number of points towards the poles in order to achieve a

quasi-uniform resolution on the sphere.

For both regular and reduced grids, no assumptions are made on the spacing

between the parallels in the

direction. The points in

-direction on every parallel

are assumed to be equispaced.

Atlas provides grid construction facilities based on a conﬁguration object of the

type Conﬁg to create global grids or regional grids. For most global grids, this

conﬁguration object can also be inferred from a simple string identiﬁer or name

containing one or more numbers representing the grid resolution. Commonly used

global grids that can currently be accessed through such name are:

•regular longitude-latitude grid (name: L<NLON>x<NLAT> or L<N>);

•shifted longitude-latitude grid (name: S<NLON>x<NLAT> or S<N>);

•regular Gaussian grid (name: F<N>);

•classic reduced Gaussian grid (name: N<N>);

D1.3 – Development of Atlas, a ﬂexible data structure framework

•octahedral reduced Gaussian grid (name: O<N>).

In the identiﬁers shown in this list,

<NLON>

stands for the number of longitudes,

<NLAT>

for the number of latitudes, and

<N>

for the number of parallels between

the North Pole and equator (interval

[90◦,0◦)

). These grids will be explained in

more detail following sections.

Figure 4showcases 4 example grids that can be created or used with Atlas.

(a) unstructured (b) classic Gaussian, N16

Figure 4: Four examples of global grids in geographical coordinates with approxi-

mately similar resolution in the equatorial region.

4.2.1 Projection

In order to support regional grids for the Limited Area Modelling (LAM) community,

projections are often needed that transform so called grid coordinates (

) to

geographic coordinates (longitude,latitude). For regional grids, the grid coordinates

are often deﬁned in meters on a regular grid, as is the case for e.g. a Lambert

conformal conic projection and a Mercator projection. Another example projection

that is also applicable to a global grid is the Schmidt projection.

In Atlas, the projection is embodied by a Projection class, illustrated in Figure 5.

It wraps an abstract polymorphic ProjectionImplementation class with currently 6

concrete implementations:

•LonLat ( type: “lonlat”, units: “degrees”, identity )

D1.3 – Development of Atlas, a ﬂexible data structure framework

•RotatedLonLat ( type: “rotated_lonlat”, units: “degrees” )

•Schmidt ( type: “schmidt”, units: “degrees” )

•RotatedSchmidt ( type: “rotated_schmidt”, units: “degrees” )

•Mercator ( type: “mercator”, units: “meters”, regional )

•RotatedMercator ( type: “rotated_mercator”, units: “meters”, regional )

•Lambert ( type: “lambert”, units: “meters”, regional )

The Projection furthermore exposes functions to convert

coordinates to

lonlat

coordinates and its inverse. For more information about each concrete projection

+ identity() : Boolean

+ lonlat( xy : PointXY ) : PointLonLat

+ xy( lonlat : PointLonLat ) : PointXY

+ xy2lonlat( inout point : Real[2] )

+ lonlat2xy( inout point : Real[2] )

+ type() : String

+ units() : String

+ regional() : Boolean

<<constructor>> Projection( configuration : Config )

- implementation : ProjectionImplementation

Projection

Implementation

RotatedLonLatLonLat

Schmidt RotatedSchmidt

Mercator RotatedMercator

Only LonLat

projection is equivalent to

identity or no projection.

Only Lambert and

Mercator projections are

regional and have units in

"meters"

Lambert

Figure 5: UML class diagram for the Projection class

implementation, refer to ESCAPE deliverable report D4.4 [3].

4.2.2 Domain

In this section, the Domain class is introduced (Figure 6). Its purpose is only

useful for non-global grids, and can be used to detect if any coordinate (

) is

contained within the domain that envelops the grid. The design follows the same

principle as the Projection: the Domain class wraps an abstract polymorphic

DomainImplementation class with currently 3 concrete implementations:

•Rectangular ( type: “rectangular” )

•ZonalBand ( type: “zonal_band”, units: “degrees” )

D1.3 – Development of Atlas, a ﬂexible data structure framework

•Global ( type: “global”, units: “degrees” )

+ contains( xy : PointXY ) : Boolean

+ contains( xy : Real[2] ) : Boolean

+ contains( x : Real, y : Real ) : Boolean

+ type() : String

+ units() : String

<<constructor>> Domain( configuration : Config )

- implementation : DomainImplementation

Domain

Implementation

Rectangular ZonalBand

Global

Figure 6: UML class diagram for the Domain class

Note

The domain has no knowledge of any grid projection. Therefore the

points that can be tested to be contained inside the domain must

be provided in “grid coordinates” (

), and not in geographical

coordinates (lon,lat).

The Rectangular domain deﬁnes a rectangular region deﬁned by 4 values:

xmin

xmax

ymin

ymax

. These values must be deﬁned in units that correspond to the used

grid projection. The ZonalBand domain assumes that the units of

and

are

in degrees, and that the domain is periodic in the

-direction. Therefore, to test

if a point is contained within this domain only requires to check if the point’s

coordinate lies in the interval

[ymin,ymax ]

. The Global domain, like the ZonalBand

domain assumes units in degrees, and always evaluates that any point is contained

within.

4.2.3 Supported Grid types

Atlas provides a basic Grid class that can embody any unstructured, regular or

reduced grid. The Grid class is a wrapper to an abstract polymorphic GridIm-

plementation class with 2 concrete implementations: Unstructured and Structured.

The Unstructured implementation holds a list of (

) coordinates (one pair for

each grid point). The Structured implementation follows the assumption of a

reduced grid. It holds a list of

-coordinates (one value for each grid parallel), a

list of number of points for each parallel, and a list of

-intervals (one pair for each

parallel) in which the points for the parallel are uniformly distributed. With the

Structured implementation, both reduced and regular grids can be represented, as

regular grids can also be interpreted as a special case of a reduced grid (where

every parallel contains the same number of points).

D1.3 – Development of Atlas, a ﬂexible data structure framework

Following code snippets shows how to construct any grid from either a conﬁguration

object or a name, both in C++ and Fortran.

C on fi g F 1 6_ c on f ig ;

F 16 _ co nf i g .set(" t yp e " ,"regular_gaussian" ) ;

F 16 _ co nf i g .set("N", 16 ) ;

G ri d F 16 (grid_config ) ; / / r e gu l a r G a us s ia n g ri d ( F 16 )

G ri d N 16 (" N 1 6 " ) ; // c l as s ic r e du c e d G a us s ia n ( N 1 6 )

Listing 3: Construction of grids, C++ example.

type(a t la s _G ri d ) :: F16 ,N16

type(atlas_Config) :: F 1 6 _ c o n f i g

F16_co n f i g =atlas_Config()

call F 16 _ co nf i g %set(" t yp e " ,"regular_gaussian" )

call F 16 _ co nf i g %set("N", 16 )

F1 6 =a t la s _G r id (F 1 6 _ c o n f i g )! r eg u l ar G a us s ia n g ri d ( F 16 )

N1 6 =a t la s _G r id (" N 16 " )! c la s si c r ed u ce d G a us s ia n g ri d ( N 16 )

Listing 4: Construction of grids, Fortran example

Note

Even though the conﬁguration object (

F16_config

) is here constructed

programatically, it may also be imported through a JSON string or

ﬁle. The regular Gaussian grid could also be constructed through a

name “F16”. Similarly the classic reduced Gaussian grid could also be

constructed through a conﬁg object with the type “classic_gaussian”.

Figure 7illustrates the Grid class implementation. It shows that the Grid class

can return instances of the Domain class and the Projection class.

+ valid() : Boolean = true

+ size() : Integer

+ begin() : Iterator

+ end() : Iterator

+ name() : String

+ uid() : String

+ domain() : Domain

+ projection() : Projection

# implementation : GridImplementation

+ <<constructor>> Grid( name : String )

+ <<constructor>> Grid( configuration : Config )

+ <<constructor>> Grid( grid : Grid )

Grid

GridImplementation

UnstructuredStructured

Figure 7: UML class diagram for the Grid class

D1.3 – Development of Atlas, a ﬂexible data structure framework

Because this basic Grid class can make no assumptions on whether it wraps a Struc-

tured or a Unstructured concrete implementation, it can only expose an interface

for the most general type of grids: the Unstructured approach. This means that we

can ﬁnd out the number of grid points with the

size()

function, and that we can

iterate over all points, assuming no particular order. The following

C++

code shows

how to iterate over all points, and use the projection to get longitude-latitude

coordinates.

Grid grid(" O 1 28 0 " );

Lo g :: info ( ) < < " T h e g ri d c o n ta i ns " < < grid.size( ) << " p oi n ts . \ n ";

fo r (PointXY p,grid ) {

Log: : info( ) < < " x y : " < < p<< "\n";

double x=p.x() ;

double y=p.y() ;

Point L o n L a t pll =grid.p ro j ec ti o n () . lonlat(p) ;

Log: : info( ) < < " lonlat: " << pll << "\n";

double lon =pll.lon ( ) ;

double lat =pll.lat ( ) ;

}

Listing 5: Iterating over all points of a octahedral reduced Gaussian grid (

O1280

)

Note

In above

C++

code we used the projection to compute the longitude and

latitude coordinates. For the used octahedral Gaussian grid however,

the projection is of the “lonlat” type by construction, meaning that

and

are already equivalent to

lon

and

lat

respectively. The second

part in the for loop was thus not necessary for this particular grid.

The basic Grid class shown in Figure 7also exposes a function

uid()

which returns

a string which is guaranteed to be unique for every possible grid. This includes

diﬀerences in projections and domains as well.

To be able to expose more structure or properties present in the grid, a number of

“grid interpretation” classes are available, that also wrap the used GridImplementa-

tion, but try to cast it to the Structured implementation if necessary. Currently

available interpretations classes are:

•

UnstructuredGrid: The grid is unstructured and cannot be interpreted as

structured.

•StructuredGrid: The grid may be regular or reduced.

D1.3 – Development of Atlas, a ﬂexible data structure framework

•RegularGrid: The grid is regular.

•ReducedGrid : The grid is reduced, and not regular.

•GaussianGrid: The grid may be a global regular or reduced Gaussian grid.

•RegularGaussianGrid: The grid is a global regular Gaussian grid.

•

ReducedGaussianGrid : The grid is a global reduced Gaussian grid, and not a

regular grid.

•RegularLonLatGrid : The grid is a global regular longitude-latitude grid.

•RegularPeriodicGrid: The grid is a periodic (in x) regular grid.

•

RegularRegionalGrid: The grid is a regional non-periodic regular grid, and

can have any projection.

Note that there is no use case for interpreting a grid as e.g. “octahedral reduced

Gaussian” or “classic reduced Gaussian”, as it does not bring any beneﬁt over the

ReducedGaussianGrid interpretation class.

Just like the basic Grid class, these interpretation classes have a function

valid()

Rather than throwing errors or aborting the program if the constraints listed

above are not satisﬁed, the user has to call the

valid()

function to assert the

interpretation is possible. Figure 8illustrates the above list schematically. Arrows

indicate a “can be interpreted by” relationship.

Grid

StructuredGrid

ReducedGridGaussianGrid

ReducedGaussianGridRegularGaussianGridRegularLonLatGrid

RegularGrid

RegularPeriodicGrid RegularRegionalGrid

UnstructuredGrid

Figure 8: UML class inheritance diagram for Grid classes

D1.3 – Development of Atlas, a ﬂexible data structure framework

Note

For a NWP model, you can usually safely assume the grid interpreta-

tions as the model can usually only work with a certain type of grid.

ECMWF’s IFS-model for instance, can assume that all used grids

can be interpreted by the GaussianGrid class, whereas a LAM-model

could e.g. assume the RegularRegionalGrid interpretation.

4.2.3.1 UnstructuredGrid

The UnstructuredGrid interpretation class constrains the grid implementation to

be Unstructured. No assumption on any form of structure can be made. Also no

assumption on the domain nor the projection used is made.

Figure 9shows the UML class diagram of the StructuredGrid. The ﬁrst two

constructors listed eﬀectively create a new grid, whereas the third constructor

accepts any existing grid, and reinterprets it instead. No copy or extra storage

is then introduced, since the wrapped GridImplementation is a reference counted

pointer (a.k.a.

shared_ptr

), of which the reference count is increased and decreased

upon UnstructuredGrid construction and destruction respectively.

Grid

+ valid() : Boolean <<override>>

+ lonlat( n : Integer ) : PointLonLat

+ xy( n : Integer ) : PointXY

<<constructor>> UnstructuredGrid( name : String )

<<constructor>> UnstructuredGrid( configuration : Config )

<<constructor>> UnstructuredGrid( xy : PointXY[] )

<<constructor>> UnstructuredGrid( grid : Grid )

UnstructuredGrid UnstructuredGrid

is valid when internal

GridImplementation

is Unstructured

lonlat() uses the

Projection to convert

xy to lonlat

Figure 9: UML class diagram for the UnstructuredGrid class

An UnstructuredGrid exposes two extra functions

xy(n)

and

lonlat(n)

. The ﬁrst

function gives random access to the (

) coordinates of grid point

. The second

function is a convenience function that internally uses the grid Projection to project

the grid coordinates xy(i,j) to geographic coordinates.

4.2.3.2 StructuredGrid

The StructuredGrid interpretation class constrains the grid implementation to be

D1.3 – Development of Atlas, a ﬂexible data structure framework

Structured. The grid may be regular or reduced. It makes no assumptions on

whether the domain is global, periodic, or regional, or whether any projection is

used. Almost any grid with some form of structure in a single area can therefore

be interpreted by this class.

Figure 10 shows the UML class diagram of the StructuredGrid. The ﬁrst two

constructors listed eﬀectively create a new grid, whereas the third constructor

accepts any Grid , and reinterprets it instead if possible. No copy or extra storage

is then introduced, since the wrapped GridImplementation is a reference counted

pointer (a.k.a.

shared_ptr

), of which the reference count is increased and decreased

upon StructuredGrid construction and destruction respectively.

Grid

+ valid() : Boolean <<override>>

+ periodic() : Boolean

+ lonlat( i : Integer, j : Integer ) : PointLonLat

+ lonlat( i : Integer, j : Integer, out lonlat : Real[2] )

+ nx( j : Integer ) : Integer

+ ny() : Integer

+ x( i : Integer, j : Integer) : Real

+ y( j : Integer ) : Real

+ xy( i : Integer, j : Integer ) : PointXY

+ xy( i : Integer, j : Integer, out xy : Real[2] )

+ nxmax() : Integer

+ nxmin() : Integer

<<constructor>> StructuredGrid( name : String )

<<constructor>> StructuredGrid( configuration : Config )

<<constructor>> StructuredGrid( grid : Grid )

StructuredGrid

is valid when internal

GridImplementation

is Structured

x follows an

equidistant spacing

which may be

different per line j

periodic == true

means that x( iLast, j)

is one x-increment

from x( iFirst, j ) in

cylindric coordinates

lonlat() uses the

Projection to convert

xy to lonlat

Figure 10: UML class diagram for the StructuredGrid class

With the information that the grid can only be reduced or regular, new accessor

functions can be exposed to access grid points more eﬀectively through indices

(

). The only functions that can be guaranteed to apply for both regular and

reduced grids, are the ones that assume a reduced grid. This means that the

coordinate and the number of points on a parallel depend on the parallel itself,

denoted by index

. For convenience, a function

lonlat(i,j)

is available that

internally uses the grid Projection to project the grid coordinates

xy(i,j)

geographic coordinates.

4.2.3.3 RegularGrid

ARegularGrid is a specialisation of a StructuredGrid by further constraining that

the number of points on every parallel is equal. In other words, points are now also

aligned in ydirection. The grid then forms a Cartesian coordinate system.

D1.3 – Development of Atlas, a ﬂexible data structure framework

With this information, access to the

coordinate of a point is now independent of

the index

, and only depends on the index

. The relevant functions that can be

adapted now are

nx()

and

x(i)

. Using these functions can possibly increase the

performance of algorithms.

Grid

StructuredGrid

+ valid() : Boolean <<override>>

+ nx() : Integer

+ x( i : Integer ) : Real

<<constructor>> RegularGrid( name : String )

<<constructor>> RegularGrid( configuration : Config )

<<constructor>> RegularGrid( grid : Grid )

RegularGrid

For a RegularGrid,

every line j has equal

number of points,

so that x and nx don't

depend on line j.

RegularGrid is valid

when it is a valid

StructuredGrid, and

nxmin() == nxmax()

Figure 11: UML class diagram for the RegularGrid class

4.2.3.4 ReducedGrid

AReducedGrid is, unlike the RegularGrid, not a specialisation of the StructuredGrid

in terms of functionality, but it does add the constraint that the grid is only valid

when it is not regular. Figure 12 shows the class diagram for this type of grid.

Grid

StructuredGrid

+ valid() : Boolean <<override>>

<<constructor>> ReducedGrid( name : String )

<<constructor>> ReducedGrid( configuration : Config )

<<constructor>> ReducedGrid( grid : Grid )

ReducedGrid

ReducedGrid is valid

when it is a valid

StructuredGrid, and

nxmin() != nxmax()

Figure 12: UML class diagram for the ReducedGrid class

4.2.3.5 GaussianGrid

AGaussianGrid is a StructuredGrid with the additional constraint that the grid

is globally deﬁned with an even number of parallels that follow the roots of a

Legendre polynomial in the interval (90

◦

,-90

◦

) [9]. This class exposes an additional

function

N()

, which is the so called Gaussian number, equivalent to the number of

D1.3 – Development of Atlas, a ﬂexible data structure framework

parallels between the North Pole and the equator. The

-coordinate of each ﬁrst

point of a parallel starts at 0

◦

(Greenwich meridian). Figure 13 shows the class

diagram for the GaussianGrid.

Grid

StructuredGrid

+ valid() : Boolean <<override>>

+ N() : Integer

<<constructor>> GaussianGrid( name : String )

<<constructor>> GaussianGrid( configuration : Config )

<<constructor>> GaussianGrid( grid : Grid )

GaussianGrid GaussianGrid is

valid when y follows

the roots of Legendre

polynomials in the

interval (90,-90) and

when the domain is

global

N is known as the

Gaussian number.

It is equivalent to

ny / 2

Figure 13: UML class diagram for the GaussianGrid class

4.2.3.6 RegularGaussianGrid

ARegularGaussianGrid combines the properties of a RegularGrid and a Gaussian-

Grid. It can be deﬁned by a single number

(the Gaussian number). The number

of points in x−and y-direction are by convention

nx = 4 N

ny = 2 N

Figure 14 shows the class diagram for the RegularGaussianGrid. As can be seen in

+ valid() : Boolean <<override>>

<<constructor>> RegularGaussianGrid( N : Integer )

<<constructor>> RegularGaussianGrid( name : String )

<<constructor>> RegularGaussianGrid( configuration : Config )

<<constructor>> RegularGaussianGrid( grid : Grid )

RegularGaussianGrid

Grid

StructuredGrid

GaussianGridRegularGrid

RegularGaussianGrid

is valid when it is a

valid RegularGrid, and

a valid GaussianGrid

nx = 4*N

ny = 2*N

Figure 14: UML class diagram for the RegularGaussianGrid class

the class diagram, an additional constructor is available, taking only this Gaussian

D1.3 – Development of Atlas, a ﬂexible data structure framework

number

, so that it is easy to create grids of this type. These grids can also be

created through the constructor taking the name “

F<N>

”, with

<N>

the Gaussian

number N.

4.2.3.7 ReducedGaussianGrid

AReducedGaussianGrid combines the properties of a ReducedGrid and a Gaussian-

Grid. A single number

(the Gaussian number), deﬁnes the number of parallels

(

= 2

), but no assumptions are made on the number of points on each parallel.

Figure 15 shows the class diagram for the ReducedGaussianGrid. As can be seen

+ valid() : Boolean <<override>>

<<constructor>> ReducedGaussianGrid( nx : Integer[ny] )

<<constructor>> ReducedGaussianGrid( name : String )

<<constructor>> ReducedGaussianGrid( configuration : Config )

<<constructor>> ReducedGaussianGrid( grid : Grid )

ReducedGaussianGrid

Grid

StructuredGrid

ReducedGridGaussianGrid

RegularGaussianGrid

is valid when it is a

valid ReducedGrid, and

a valid GaussianGrid

Figure 15: UML class diagram for the ReducedGaussianGrid class

in the class diagram, an additional constructor is available, taking an array of

integer values with size equal to the number of parallels (must be even). The values

correspond to the number of points for each parallel. The WMO GRIB standard

also refers to this array as “

”, and IFS refers to this array as “

NLOEN

”. In Atlas it

is referred to as the array

(cfr. the StructuredGrid). The number of parallels

is inferred by the length of this array, and the Gaussian

number is then

ny/

which is used to deﬁne the y-coordinate of the parallels.

Classic reduced Gaussian grids

In practise we tend to use only a small subset of the inﬁnite possible combinations

of reduced Gaussian grids for a speciﬁc

number. Until around 2016, ECMWF’s

IFS-model was using reduced Gaussian grids for which the

-array was not

straightforward to compute. These arrays for all used reduced Gaussian grids were

tabulated. We now refer to these grids as “classic” reduced Gaussian grids, and

they can be created through the name “

N<N>

”, with

<N>

the Gaussian number

D1.3 – Development of Atlas, a ﬂexible data structure framework

Not any value of

is possible because there are only a limited number of such

grids created (only the ones used). Atlas can create classic reduced Gaussian grids

for values of

in the list [ 16, 24, 32, 48, 64, 80, 96, 128, 160, 200, 256, 320, 400,

512, 576, 640, 800, 1024, 1280, 1600, 2000, 4000, 8000 ].

Octahedral reduced Gaussian grids

Since around 2016, ECMWF’s IFS-model now uses reduced Gaussian grids for

which the

-array can be computed by a simple formula rather than a complex

algorithm. These grids are referred to as “octahedral” reduced Gaussian grids. The

nx-array can be computed as follows in C++:

in t j L a s t = 2 * N-1;

fo r (int j=0; j<N; ++ j) {

nx [j] = 2 0 + 4* j;// Up to e q u a t o r

nx [jLast -j] = n x [j]; // S y m m etry a r ound e q u a t o r

}

Listing 6: Computing the

-array for octahedral reduced Gaussian grids,

C++

example

In order to refer to these grids easily in common language, and to more easily

construct these grids using the constructor taking a name, the name “

O<N>

” was

chosen, with

<N>

the Gaussian number

, and

referring to “octahedral”. The

term “octahedral” originates from the inspiration to project a regularly triangulated

octahedron to the sphere. Few modiﬁcations to the resulting grid were made to

make it a suitable reduced Gaussian grid for a spectral transform model [10].

Note

Models or other software applications should not treat the octahedral

reduced Gaussian grid as a special case. For all means and purposes

it is still a reduced Gaussian grid, following all requirements layed out

by the WMO GRIB standard!

4.2.3.8 RegularLonLatGrid

The RegularLonLatGrid is likely the most commonly used grid on the sphere. It is

a global grid regular grid deﬁned in degrees with a uniform distribution both in

and in

-direction. Atlas supports 4 variants of the RegularLonLatGrid, each with

2 identiﬁer names:

•standard: L<NLON>x<NLAT> or L<N>

D1.3 – Development of Atlas, a ﬂexible data structure framework

•shifted: S<NLON>x<NLAT> or S<N>

•longitude-shifted: SLON<NLON>x<NLAT> or SLON<N>

•latitude-shifted: SLAT<NLON>x<NLAT> or SLAT<N>

In the identiﬁer names,

<NLON>

and

<NLAT>

denote respectively

and

of a

regular grid. For ease of comparison with the Gaussian grids, these grids can also

be named instead with a

number denoting the number of parallels in the interval

[90

◦,

◦

)– between the North Pole and equator by including Pole and excluding

equator. The

- and

-increment is then computed as 90

◦/N

. For each of the grids,

all points are deﬁned in the range 0

◦≤x <

360

◦

and

−

◦≤y≤

+90

◦

. For the

standard case, the ﬁrst and last parallel are located exactly at respectively the

North and South Pole. Usually the number of parallels

<NLAT>

is odd, so

that there is also exactly one parallel on the equator. It is also guaranteed that the

ﬁrst point on each parallel is located on the Greenwich meridian (

= 0

◦

). In this

context, shifted denotes a shift or displacement of

- and

-coordinates of all points

with half increments with respect to the standard (or unshifted) case. In order to

achieve the same

- and

-increment as the standard case, the shifted case should

be constructed with one less parallel. The two remaining cases longitude-shifted

and latitude-shifted shift only respectively the

coordinate of each grid point.

Figure 16 shows the class diagram for the RegularLonLatGrid. It can be seen that

this class exposes 4 functions to query which of the 4 variants is presented.

+ valid() : Boolean <<override>>

+ standard() : Boolean

+ shifted() : Boolean

+ shiftedLon() : Boolean

+ shiftedLat() : Boolean

<<constructor>> RegularLonLatGrid( name : String )

<<constructor>> RegularLonLatGrid( configuration : Config )

<<constructor>> RegularLonLatGrid( grid : Grid )

RegularLonLatGrid

Grid

StructuredGrid

RegularGrid

RegularLonLatGrid

is valid when it is a valid global RegularGrid,

its Projection is "lonlat", and spacing in y is

equidistant

standard:

x(iFirst) = 0

y(jFirst) = 90

shifted :

x(iFirst) = dx / 2

y(jFirst) = 90 - dy / 2

shiftedLon :

x(iFirst) = dx / 2

y(jFirst) = 90

shiftedLat :

x(iFirst) = 0

y(jFirst) = 90 - dy / 2

Figure 16: UML class diagram for the RegularLonLatGrid class

D1.3 – Development of Atlas, a ﬂexible data structure framework

4.2.3.9 RegularPeriodicGrid

The RegularPeriodicGrid can be used to assert that the grid is a regular grid with

equidistant spacing in

- and

-direction, and with periodicity in the

-direction.

The latter enforces an implicit additional constraint that

and

are deﬁned in

degrees. Figure 17 shows the class diagram for the RegularPeriodicGrid.

+ valid() : Boolean <<override>>

<<constructor>> RegularPeriodicGrid( name : String )

<<constructor>> RegularPeriodicGrid( configuration : Config )

<<constructor>> RegularPeriodicGrid( grid : Grid )

RegularPeriodicGrid

Grid

StructuredGrid

RegularGrid

RegularPeriodicGrid

is valid when it is a valid RegularGrid,

periodic in x-direction, and spacing in y is

equidistant.

The units of the projection are restricted to be

in degrees.

Figure 17: UML class diagram for the RegularPeriodicGrid class

4.2.3.10 RegularRegionalGrid

The RegularRegionalGrid is a grid that asserts that the grid is not global nor

periodic. The gridpoints must be equidistant both in

- and

-direction. No

restrictions on projections are made. This grid would be the typical use-case grid

to use in conjuction with e.g. a Lambert, Mercator, or RotatedLonLat projection.

Figure 18 shows the class diagram for the RegularRegionalGrid. Construction of

+ valid() : Boolean <<override>>

<<constructor>> RegularRegionalGrid( name : String )

<<constructor>> RegularRegionalGrid( configuration : Config )

<<constructor>> RegularRegionalGrid( grid : Grid )

RegularRegionalGrid

Grid

StructuredGrid

RegularGrid

RegularRegionalGrid

is valid when it is a valid RegularGrid,

not global, not periodic, and spacing in y is

equidistant.

No restrictions on projections are made.

Figure 18: UML class diagram for the RegularRegionalGrid class

grids of this type can be done in various ways through conﬁguration. Refer to

ESCAPE deliverable report D4.4 [3] for more information.

D1.3 – Development of Atlas, a ﬂexible data structure framework

4.2.4 Partitioner

Even though the Grid object itself is not distributed in memory as it does not have

a large memory footprint, it is necessary for parallel algorithms to divide work over

parallel MPI tasks.

There exist various strategies in how to partition a grid, where each strategy may

oﬀer diﬀerent advantages, depending on the grid and numerical algorithms to be

used.

Atlas implements a grid Partitioner class, that given a grid, partitions the grid and

creates a Distribution object that describes for each grid point which partition it

belongs to. Figure 19 illustrates the UML class diagram for the Partitioner class.

Following a similar design philosophy as before, the Partitioner class wraps an

abstract polymorphic PartitionerImplementation object. Figure 20 illustrates the

UML class diagram for the Distribution class.

+ partition( grid : Grid ) : Distribution

+ type() : String

<<constructor>> Partitioner( configuration : Config )

<<constructor>> Partitioner( partitioner : Partitioner )

- implementation : PartitionerImplementation

Partitioner

Implementation

EqualRegions Checkerboard

MatchingMesh

<<constructor>> MatchingMeshPartitioner( mesh : Mesh,

configuration : Config )

MatchingMeshPartitioner

Figure 19: UML class diagram for the Partitioner class

+ nb_partitions() : Integer

+ operator()( point : Integer ) : Integer

+ size() : Integer

<<constructor>> Distribution( grid : Grid, partitioner : Partitioner )

Distribution

Figure 20: UML class diagram for the Distribution class

Currently there are 3 concrete implementations of the PartitionerImplementation:

•Checkerboard ( type: “checkerboard” ) – Partitions a grid in regular zones

•

EqualRegions ( type: “equal_regions” ) – Partitions a grid in equal regions,

reminiscent of a disco ball.

•

MatchingMesh ( type: “matching_mesh” ) – Partitions a grid such that grid

points following the domain decomposition of an existing mesh which may

be based on a diﬀerent grid.

D1.3 – Development of Atlas, a ﬂexible data structure framework

The Checkerboard and EqualRegions implementations can be created from a conﬁg-

uration object only. The MatchingMesh implementation requires a further mesh

argument to its constructor. For this reason, a MatchingMeshPartitioner class ex-

ists whose only purpose is that it knows how to construct its related MatchingMesh

implementation with the extra mesh argument.

4.2.4.1 Checkerboard Partitioner

For regular grids, such as the one depicted in Figure 4c, a logical domain decom-

position would be a checkerboard. The grid is then divided as well as possible

into approximate rectangular zones in Cartesian grid coordinates (

) with an

equal number of grid points. An example of this partitioning algorithm is shown in

Figure 21.

Figure 21: Example Checkerboard partitioning of a shifted regular longitude-latitude

grid (S64x32) in 32 partitions.

4.2.4.2 EqualRegions Partitioner

For reduced grids as the ones shown in Figure 4b and Figure 4d or for uniformly

distributed unstructured grids, an “equal regions” domain decomposition is more

advantageous [11]–[13]. The “equal regions” partitioning algorithm divides a two-

dimensional grid of the sphere (i.e. representing a planet) into bands from the

North pole to the South pole. These bands are oriented in zonal directions and

each band is then split further into regions containing equal number of grid points.

D1.3 – Development of Atlas, a ﬂexible data structure framework

The only exceptions are the bands containing the North or South Pole, that are

not subdivided into regions but constitute North and South polar caps.

An example of this partitioning algorithm is shown in Figure 22

Figure 22: Example EqualRegions partitioning of a

N16

classic reduced Gaussian

grid in 32 partitions.

4.2.4.3 MatchingMesh Partitioner

The MatchingMeshPartitioner allows to create a Distribution for a grid such that the

grid points follows the domain decomposition of an existing mesh (described in detail

in Section 4.3). This partitioning strategy is particularly useful when grid points of

a partition should be contained within a mesh partition present on the same MPI

task to avoid parallel communication during coupling or interpolation algorithms.

Note that there is no guarantee of any load-balance here for the partitioned grid.

Figure 23 shows an example application of the MatchingMeshPartitioner.

4.3 Mesh

For a wide variety of numerical algorithms, a Grid (i.e. a mere ordering of points

and their location) is not suﬃcient and a Mesh might be required. This is usually

obtained by connecting grid points using polygonal elements (also referred to as

cells), such as triangles or quadrilaterals. A mesh, denoted by

, can then be

D1.3 – Development of Atlas, a ﬂexible data structure framework

Figure 23: Example partitioning in 32 parts of a F8 rectangular Gaussian grid

(solid dots) using the domain decomposition of an existing meshed N24 classic

reduced Gaussian grid. Each domain is shaded and surrounded by a solid line. The

jagged lines of the existing N24 mesh subdomains are contours of its elements.

deﬁned as a collection of such elements Ωi:

M:=∪N

i=1 Ωi.(1)

For regular grids, the mesh elements can be inferred, as a blocked arrangement

of quadrilaterals. For unstructured grids or reduced grids (Section 4.2), these

elements can no longer be inferred, and explicit connectivity rules are required. The

Mesh class combines the knowledge of classes Nodes,Cells,Edges, and provides

a means to access connectivities or adjacency relations between these classes).

Nodes describes the nodes of the mesh, Cells describes the elements such as

+ nodes : Nodes

+ cells : Cells

+ edges : Edges

Mesh

+ lonlat : Field

+ global_index : Field

+ partition : Field

+ remote_index : Field

+ edge_connectivity : Connectivity

+ cell_connectivity : Connectivity

Nodes

+ global_index : Field

+ partition : Field

+ remote_index : Field

+ node_connectivity : Connectivity

+ edge_connectivity : Connectivity

+ elements : vector<Elements>

Cells

+ global_index : Field

+ partition : Field

+ remote_index : Field

+ node_connectivity : Connectivity

+ cell_connectivity : Connectivity

+ elements : vector<Elements>

Edges

Figure 24: Mesh composition

D1.3 – Development of Atlas, a ﬂexible data structure framework

triangles and quadrilaterals, and Edges describes the lines connecting the nodes

of the mesh. Figure 24 sketches the composition of the Mesh class with common

access methods for its components. Diﬀerently from the Grid, the Mesh may be

distributed in memory. The physical domain

is decomposed in sub-domains

and a corresponding mesh partition Mpis deﬁned as:

Mp:={∪ Ω,∀Ω∈ Sp}.(2)

More details regarding this aspect are given in Section 4.4.

AMesh may simply be read from ﬁle by a MeshReader, or generated from Grid

by a MeshGenerator. The latter option is illustrated in Figure 2, where the grid

points will become the nodes of the mesh elements. Listing 7shows how this can

be achieved in practice, and Figure 25b visualises the resulting mesh for grids

N16

and O16.

Grid grid(" O 16 " ) ;

M es h Ge n er a to r ge ne r at o r (" st r uc t ur e d " ) ;

Mesh mesh =g en e ra t or .g en e ra t e (grid );

Listing 7: C++ Mesh generation from a StructuredGrid

Note

For UnstructuredGrids, another Meshgenerator needs to be used

based on e.g. Delaunay triangulation (type=“delaunay”). Whereas

the StructuredMeshGenerator is able to generate a parallel distributed

mesh in one step, the DelaunayMeshGenerator currently only supports

generating a non-distributed mesh using one MPI task. In the future

it is envisioned that this implementation will be parallel enabled as

well.

(a) classic Gaussian, N16 (b) octahedral Gaussian, O16

Figure 25: Mesh generated for two types of reduced grids (Figure 4)

D1.3 – Development of Atlas, a ﬂexible data structure framework

Because several element types can coexist as cells, the class Cells is composing a

more complex interplay of classes, such as Elements,ElementType,BlockConnectiv-

ity, and MultiBlockConnectivity. This composition is detailed in Figure 26. Atlas

CellsElements (of 1 type)

MultiBlockConnectivityBlockConnectivityElementType

0..*

Figure 26: Mesh Cells diagram.

provide various type of connectivity tables: BlockConnectivity, IrregularConnectiv-

ity and MultiBlockConnectivity. BlockConnectivity is used when all elements of

the mesh are of the same type, while IrregularConnectivity is more ﬂexible and

used when the elements in the mesh can be of any type. The BlockConnectivity

implementation has a regular structure of the lookup tables and therefore provides

better computational performance compared to the IrregularConnectivity. Finally

the MultiBlockConnectivity supports those cases where the mesh contains various

types of elements but they can still be grouped into collections of elements of the

same type so that numerical algorithms can still beneﬁt from performing operations

using elements of one element type at a time. The Elements class provides the view

of elements of one type with node and edge connectivities as a BlockConnectivity.

The interpretation of the elements of this one type is delegated to the ElementType

class. The Cells class is composed of multiple Elements and provides a uniﬁed

view of all elements regardless of their shape. The MultiBlockConnectivity provides

a matching uniﬁed connectivity table. Each block in the MultiBlockConnectivity

shares its memory with the BlockConnectivity present in the Elements to avoid

memory duplication (see Figure 27).

MultiBlock

Connectivity

Block

Connectivity

Block

Connectivity

Figure 27: BlockConnectivity points to blocks of MultiBlockConnectivity. Zig-zag

lines denote how the data is laid out contiguously in memory.

Although currently the mesh is composed of two-dimensional elements such as

quadrilaterals and triangles, three-dimensional mesh elements such as hexahedra,

D1.3 – Development of Atlas, a ﬂexible data structure framework

tetrahedra, etc. are envisioned in the design and can be naturally embedded within

the presented data structure. However, at least for the foreseeable future in NWP

and climate applications, the vertical discretisation may be considered orthogonal

to the horizontal discretisation due to the large anisotropy of physical scales in

horizontal and vertical directions. Given a number of vertical levels, polygonal

elements in the horizontal are then extruded to prismatic elements oriented in the

vertical direction (e.g. [14]).

4.4 Parallelisation

Parallelisation in Atlas is achieved through distributing the Mesh into diﬀerent

partitions, each acting like a smaller mesh and each mesh partition

is managed

by one MPI task. The idea is to load-balance numerical computations and memory

among the MPI tasks, meaning that every mesh partition has approximately the

same number of elements, or the same number of nodes.

Looking back at the typical workﬂow on how to use Atlas, presented in Figure 2,

we start with a Grid object that doesn’t have any notion of parallelisation, and

we want to end up with a distributed Mesh object. One approach could be to

ﬁrst generate the mesh from the grid on one MPI task with a MeshGenerator

object (see Section 4.3), then call a partitioning algorithm on the mesh, and then

distribute the mesh partitions to the other MPI tasks. This approach has major

ﬂaws in parallel eﬃciency as many MPI tasks are waiting for computations of the

master MPI task to ﬁnish, and then wait to receive their mesh partition. Another

approach would be to do a partitioning of the grid points before or during the mesh

generation step, and only generate the mesh partitions using the grid points whose

partitioning corresponds to the required MPI task. In principle this is applicable

to both UnstructuredGrid s and StructuredGrids. Currently Atlas has however only

implemented such parallel enabled mesh generator for StructuredGrids.

Examples of two meshes partitioned into diﬀerent parallel regions using the Equal-

Regions partitioning algorithm are illustrated in Figure 28.

Every mesh partition can be regarded as an independent mesh, but to allow for

computational stencils that span from one mesh partition to the next, halo’s that

overlap are created between relevant mesh partitions. Atlas provides functionality to

incrementally grow the overlap between mesh partitions by node-sharing elements.

Figure 29 shows the overlap region generated for two such regions, as well as a so

called “periodic overlap region” that can be used to treat the periodic East-West

boundary as if it were an internal boundary between mesh partitions. Discrete ﬁeld

values present in overlap regions require synchronisation with values of neighbouring

partitions for performing stencil operations. For this synchronisation, the mesh

D1.3 – Development of Atlas, a ﬂexible data structure framework

Figure 28: EqualRegions domain decomposition. Left:

O1280

mesh with

∼

6.6

million nodes (

∼

9 km grid spacing) in 1600 partitions. Right:

O32

mesh with 5248

nodes (∼280 km grid spacing) in 32 partitions.

Figure 29: Parallel overlap regions or halo’s shown for a

O32

mesh with 32 partitions.

partition must be aware of how it ﬁts inside the whole mesh. As shown in Figure 24,

the Nodes,Cells, and Edges classes contain three ﬁelds, intended as discrete values,

that provide exactly this awareness.

•The ﬁeld named global_index contains a unique global index or ID for each

node or element in the mesh partition as if the mesh was not distributed.

The global index is independent of the number of partitions.

•

The ﬁeld named partition contains the partition index that has ownership

of the node or element. Nodes or elements whose partition does not match

the partition index of the mesh partition are also called ghost nodes or ghost

elements respectively. These ghost entities merely exist to facilitate stencil

operations (such as derivatives) or to complete, for instance, a mesh element.

D1.3 – Development of Atlas, a ﬂexible data structure framework

•

The ﬁeld named remote_index contains the location or local index of each

node or element on the partition that owns it.

With the knowledge of partition and remote_index, it is possible to know, for each

element or node, which partition owns it and at which local index therein. Usually

the Atlas’ user will not be aware of these three ﬁelds as they are required only for

constructing Atlas’ internal parallel communication capabilities.

Currently, Atlas provides two parallel communication classes that, given the three

ﬁelds such as partition,remote_index and global_index, can apply parallel commu-

nication operations repeatedly as needed:

•

The GatherScatter class implements the communication operation that gath-

ers data from all MPI tasks to one MPI task, and vice versa: the communi-

cation operation that scatters or distributes all data from one MPI task to

all MPI tasks.

•

The HaloExchange class implements the communication operation that sends

and receives data to and from MPI tasks containing nearest-neighbour parti-

tions. This operation is typically required when synchronising halo’s of ghost

entities surrounding a domain partition.

These parallel communication classes form building blocks that provide parallel

capabilities to the FunctionSpace class, which can manage the gathering, scattering

or halo-exchanging of Fields.

4.5 FunctionSpace

The FunctionSpace class is introduced because a Field (Section 4.6) can be discre-

tised on the computational domain in various ways: e.g. on a grid, on mesh-nodes,

mesh-cell-centers or spectral coeﬃcients. The representation of a given variable

is intimately related to the spatial numerical discretisation strategy one wants to

adopt (e.g. ﬁnite volume, spectral element, spectral transform, etc.). In addition

to interpreting how a Field is discretised, the FunctionSpace also manages how the

Field is parallelised and laid out in memory. It implements parallel operations such

as gather and scatter, reduce-all, point-to-point communications, thus enabling the

practical use of ﬁelds within parallel numerical algorithms.

In Atlas, the FunctionSpace concept, depicted in Figure 30, is implemented in a

modular OO paradigm that allows adding as many diﬀerent function spaces as

required. This modularity allows third-party applications to extend the library

D1.3 – Development of Atlas, a ﬂexible data structure framework

functionspace::

NodeColumns

functionspace::

EdgeColumns

Mesh

HaloExchange

GatherScatter

0..*

1 1

functionspace::

StructuredColumns

functionspace::

Spectral

grid::Structured

Trans

Parallelisation

FunctionSpace

Field managed by

111

1 1

0..*

Figure 30: FunctionSpace implementations including building blocks required to

interpret Fields and abstract parallelisation.

with their own FunctionSpaces while still proﬁting from the parallelisation primi-

tives provided by Atlas (highlighted in dashed blue). The currently implemented

FunctionSpace classes include NodeColumns,EdgeColumns,StructuredColumns

and Spectral:

•

The NodeColumns function space class describes the discretisation of ﬁelds

with values collocated at the nodes of the mesh, horizontally, and may have

multiple layers deﬁned in the vertical direction. Parallelisation is deﬁned

in the horizontal plane, so that complete vertical columns are available on

each partition. The memory layout for ﬁelds deﬁned using the NodeColumns

function space is illustrated in Figure 31. A HaloExchange object and

GatherScatter object are responsible for the necessary parallel operations

(Section 4.4). The NodeColumns function space also implements some simple

additional features, such as calculating global minimum and maximum values

of ﬁelds as well as some global reduction computations such as arithmetic

mean values.

•

The EdgeColumns function space class describes the discretisation of ﬁelds

with values collocated at the edges of the mesh, and may have multiple layers

deﬁned in a vertical direction. The various operations just described for the

NodeColumns class are also available for this class.

•

The StructuredColumns function space class describes the discretisation of

distributed ﬁelds on a Structured grid object. Currently the Structured grid

must be Gaussian (see Section 4.2) because the function space delegates its

parallel primitives to a speciﬁc Trans object that only supports Gaussian grids.

As the Trans object is an interface with an external library that implements

spectral transformations, we do not report the details here, but it is a good

D1.3 – Development of Atlas, a ﬂexible data structure framework

example of interfacing with pre-existing high performance codes. In a future

release the parallelisation will be generalised to use a GatherScatter object

instead, which does not rely on having a Gaussian grid. A ﬁeld described

using this function space, like the two above, can also have vertical levels.

•

The Spectral function space class describes a ﬁeld in terms of vertical layers

of horizontal spherical-harmonics (global spherical representation). The

parallelisation (gathering and scattering) is again delegated to the Trans

object.

4*N + 0

4*N + 1

4*N + 2

4*N + 3

1*N + 0

1*N + 1

1*N + 2

1*N + 3

0*N + 0

0*N + 1

0*N + 2

0*N + 3

3*N + 0

3*N + 1

3*N + 2

3*N + 3

2*N + 0

2*N + 1

2*N + 2

2*N + 3

Figure 31: Memory layout for ﬁelds discretised using the NodeColumns function

space. A vertical column is contiguous in memory, and can be indexed using direct

addressing. Nstands for the number of vertical layers.

Listing 8and 9are provided to help understand how a FunctionSpace can be used in

practice to create a ﬁeld, and perform a halo-exchange on this ﬁeld. Listing 8and 9

show both the C++ and the Fortran code, respectively.

N od e Co lu m ns f u nc t io ns p ac e (mesh ,Halo(1) ) ;

Fiel d f i e l d =f u nc ti o ns p ac e .createField <double >( f i el d :: levels(100) );

f un c ti on s pa c e .haloExchange(field );

Listing 8: C++ FunctionSpace example use

type(atlas_functionspace_NodeColumns) :: f u n c t i onsp a c e

type(atlas_Field) :: f i e l d

func t i o n s p a c e =atlas_functionspace_NodeColumns (mesh ,halo=1 )

fiel d =f u nc t io n sp a ce %create_field(a tl as _r ea l (8 ) , levels=100 )

call f un c ti on s pa c e %ha l o_ e xc h an g e (field )

Listing 9: Fortran FunctionSpace example use

D1.3 – Development of Atlas, a ﬂexible data structure framework

4.6 Field

The Field class contains the values of a full scalar, vector or tensor ﬁeld. The Field

values are stored contiguously in memory, and moreover they can be mapped to

an arbitrary indexing mechanism to target a speciﬁc memory layout. The ability

to adapt the memory layout to match for instance the most eﬃcient data access

patterns of a speciﬁc hardware is a key feature of Atlas. A Field also contains

Metadata which stores simple information like a name, units, or other relevant

information. The composition of the Field class is illustrated in Figure 32. A Field

Field

FieldSet

0..*

Array Metadata

FunctionSpace

0..*

Figure 32: Field composition.

delegates the access and storage of the actual memory to an Array that accommo-

dates memory storage on heterogeneous hardware

. If the Field is associated to a

particular FunctionSpace, then the Field also contains a reference to it.

AFunctionSpace, as mentioned, permits the deﬁnition of parallel operations to

be carried out on a given ﬁeld. It deﬁnes a memory layout and is related to a

particular spatial discretisation.

Fields can also be grouped together into one or more FieldSets. They can then be

accessed from the FieldSet by name or by index. In

C++

, access to the actual ﬁeld

data is via an

make_view<Value,Rank>()

construct that creates a view of the ﬁeld

data with a multi-dimensional indexing accessor. In Fortran, the data is directly

accessed through the multi-dimensional array intrinsics of the language. Practical

use of the Field, both using C++ and Fortran, is given Listings 10 and 11.

The Array is responsible to synchronise data across the device (e.g. a GPU) and the host

(e.g. a CPU).

D1.3 – Development of Atlas, a ﬂexible data structure framework

F ie l dS et f i el d s ;

fields.a dd (f u nc t io n sp a ce .createField <double > ( fi el d :: name("temperature ") ,

f ie ld : : levels(n b_ le v el s ) ) ) ;

fields.a dd (f u nc t io n sp a ce .createField <double > ( fi el d :: name(" p re s su r e ") ,

f ie ld : : levels(n b_ le v el s ) ) ) ;

Fiel d f i e l d _ T =fields ["temperature "] ;

Fiel d f i e l d _ P =fields [" p re s su r e "] ;

// C r e a t e ( 2 D ) vi e w s o f the fields to a c c e s s t h e d a t a

auto T=make_view <double ,2 > ( field_T) ;

auto P=make_view <double ,2 > ( field_P) ;

fo r (s iz e_ t j no de =0; jnode <nb_nodes; + + jn o de ) {

for(s iz e _t j le v =0; jlev <n b_ l ev el s ; + + jlev) {

// T ( j no d e , j l ev ) = . . .

// P ( j no d e , j l ev ) = . . .

}

Listing 10: C++ Field base class

More detail on the Array and ArrayView class can be found in Section 5.

type(atlas_FieldSet) :: fields

type(atlas_Field) :: field_T ,field_P

real( 8 ) , pointer : : T( : ,: ) , P(:,:)

fields =atlas_FieldSet( )

call fields%a dd (f u nc t io n sp a ce %create_field(kind=a tl a s_ re al ( 8 ) , name="

temperature",levels =n b _l ev e ls ) )

call fields%a dd (f u nc t io n sp a ce %create_field(kind=a tl a s_ re al ( 8 ) , name="

pressure",levels=n b _l ev e ls ) )

field_T =fields%g et (" temperature")

field_P =fields%g et (" p r es s ur e " )

call field_T%data(T)

call field_P%data(P)

do j n od e =1 , nb_nodes

do jlev = 1 , nb_l e v e l s

! T ( j le v , j no de ) = .. .

! P ( j le v , j no de ) = .. .

endd o

Listing 11: Fortran Field base class

4.7 Mathematical Operations

Many NWP and climate models contain algorithms to perform a variety of math-

ematical operations on ﬁelds such as computing derivatives or integrals. These

operations are common to various applications, and relate closely to certain spatial

discretisations or function spaces. Atlas provides implementations for some of

D1.3 – Development of Atlas, a ﬂexible data structure framework

these operations given a ﬁeld that is compatible with the related FunctionSpace

(Section 4.5). Figure 33 sketches the philosophy adopted by Atlas regarding how

to provide these operators. The concrete implementation of the Method concept

+ gradient()

+ divergence()

+ curl()

+ laplacian()

Nabla

+ create( Method ) : Nabla

fvm::Nabla fvm::Method

Method

functionspace::

NodeColumns

functionspace::

EdgeColumns

0..*

ConcreteOperator ConcreteMethod

Method

0..*

Operator

FunctionSpace

0..*

Figure 33: Left: general design of numerical operators. Right: Derivative, diver-

gence, curl, and Laplacian implemented in the Nabla vector operator speciﬁc for a

ﬁnite volume Method [15].

uses the FunctionSpace and Field classes, both required to generate a concrete

numerical method. Atlas currently provides a fvm::Method class, which contains

everything required to construct mathematical operators using an edge-based ﬁ-

nite volume scheme [15]. A concrete fvm::Nabla operator then implements the

actual numerical algorithm using the fvm::Method. Listing 12 details the practical

construction of the fvm::Method and how the gradient of a scalar ﬁeld deﬁned in

NodeColumns is constructed. Note that this implementation can compute gradients

of three-dimensional ﬁelds (with vertical levels), but only computes the horizontal

components.

fv m :: Method method (mesh ) ;

N ab la n ab l a (method );

Field scalar_field =method.nodeColumns ( ) . createField <double >(

f ie ld : : levels( 1 00) ) ;

Field gradient_field =method.nodeColumns ( ) . createField <double >(

f ie ld : : levels( 10 0) , f i el d :: v ar ia bl es (2 ) ) ;

/* . .. cod e m i s s i n g t h a t s e t s up t h e s c a l a r _ f i e l d .. . */

n ab la .gradient (scalar_field ,gradient_field);

Listing 12:

C++

numerical operator Nabla that computes the gradient of a scalar

ﬁeld

D1.3 – Development of Atlas, a ﬂexible data structure framework

5 Accelerator Support

Atlas is used as the abstraction layer of the underlying grid used for implementing

numerical operators of many of the dwarfs proposed in the ESCAPE project. In

order to support the port of the diﬀerent dwarfs to heterogeneous architectures the

Atlas library needs to be extended. In particular for NVIDIA GPUs, the memory

space of the accelerator is separated from the CPU that manages the memory. In

this section we describe the developments performed in order to support GPUs by

encapsulating the data-management of the Atlas mesh and ﬁeld data structures

into the application-programming-interface(API) of Atlas.

In a ﬁrst phase, diﬀerent approaches and strategies to support Atlas data structures

for accelerators were studied. Since the ESCAPE DSL (based on the GridTools

library) will also use the Atlas mesh data structures to implement numerical

operators on irregular grids on the sphere (Task 2.3), the interoperability of the

Atlas data structures and ESCAPE DSL library is an important aspect.

Three possible strategies were evaluated:

Allocate mirror storages for the accelerator memory within the existing

data structures using the CUDA API for memory management and provide

functions to synchronise the CPU and accelerator memory spaces. In order

to inter-operate with the DSL a converter between GridTools and Atlas is

required.

Replace the existing Atlas data structures by GPU capable GridTools storages.

Hybrid solution where both options, the Atlas native data structures and

GridTools storages are supported.

Using the GridTools storage has the advantage that its storage management

framework already solves the problem of supporting storages in the accelerator

memory and provides an API to operate on them and synchronise the CPU

and accelerator copies. Additionally the framework allows to ﬂexibly choose the

most eﬃcient memory layout for diﬀerent computing architectures. Finally an

integration of GridTools storage as the underlying data management layer for Atlas

data structures will provide a high level of interoperability with the DSL, as it

uses the same storages. Therefore option 3 was chosen, since it beneﬁts from the

aforementioned advantages and at the same time retains the native Atlas storage

implementation and does not enforce a dependency on the GridTools library.

D1.3 – Development of Atlas, a ﬂexible data structure framework

5.1 GridTools storage layer

The GridTools storage module provides ﬂexible data structures for storing ﬁelds

on a grid with support for GPU accelerators.

Usually storages of programming languages like C++ or Fortran do not allow to

specify the memory layout of the space (and extra) dimensions of a ﬁeld. This is a

crucial functionality for performance portability, since diﬀerent algorithm motifs

and computing architectures require diﬀerent memory layout for an eﬃcient access

to the memory and to increase the data locality aspects of the algorithm.

The

C++

data structures of GridTools are very general and allow to customise

properties like dimensionality of the ﬁeld, memory layout, alignment, accelerator

support, etc. Listing 13 shows the example of a creation of a customised GridTools

storage. The memory layout is abstracted in this case by the CUDA backend which

chooses the optimal layout for GPUs.

using storage_info_t =storage_traits <Cuda >::storage_info_t <

3, // Ra n k

halo <2 , 2 ,0 > // H a l o o f size 2 fo r i n d i c e s i , j

using data_store_t =storage_traits <Cuda >::data_store_t <

double ,// Data type

storage_info_t // D a t a s t o r a g e info

// H o r i z o n t a l w i n d field w i t h N i = Nj = 128 w ith 2 c o m p o n e nts ( u , v )

data_store_t wind( 12 8 ,1 28 , 2) ;

Listing 13:

C++

Example of Gridtools storage API. It shows the creation of a

storage for a rank-3 array of double precision and a halo of 2 grid points in the

and jdimensions.

This low-level GridTools storage framework is managed by the Atlas data structures,

whose API abstracts the underlying implementation (native or GridTools storages)

and allows to access and synchronise the GPU and CPU memory spaces. Further

details will be provided in the following sections.

For more information on the low-level GridTools storage capabilities, refer to the

GridTools developments [6].

D1.3 – Development of Atlas, a ﬂexible data structure framework

5.2 Atlas and GridTools storage integration

The two main Atlas data structures that need to be supported for accelerators

are the Array, used by the Atlas Field (Section 4.6) and the connectivity classes

used by the Mesh (Section 4.3). Both type of Atlas data structures have now

GPU support by means of the GridTools storage framework and provide an API

to manage the GPU memory space of the ﬁelds and connectivity tables.

Additionally the

make_view

constructs of Atlas support now the creation of speciﬁc

array views for the CPU and the GPU device. The Atlas ArrayView is used to

interpret a given storage as N-dimensional array by providing a parenthesis operator

for accessing the data, similar to the Fortran syntax to access N-dimensional arrays.

The GridTools storage has the advantage that the memory layout can be customised

to provide an optimal layout for a speciﬁc computing architectures.

The UML class diagram for the Array class and its relation to ArrayView is shown

in Figure 34. It is shown how the Array abstracts its implementation to use either

the Atlas native data storage or the GridTools storage.

Note

To take advantage of the GPU related capabilities, the Grid-

ToolsDataStore implementation needs to be selected with the

CUDA GPU backend. This is achieved via compile time

options

-DENABLE_GRIDTOOLS_STORAGE=ON -DENABLE_GPU=ON

(see

Section 3.3).

Listing 14 shows and example of creation and use of a CPU array view. Similarly

Listing 15 demonstrate the creation of a storage that is cloned to the GPU and

the creation and use, within a CUDA kernel, of a device view.

A rr ay *d s =A rr ay : : create <double >( nb_nodes ,n b _l e ve l s );

// C r e a t e a host vie w t o i n t e r p r e t t he A rray as a 2D st o r a g e o f d o u b l e s

auto hv =make_host_view <double , 2 > (* d s ) ;

fo r (s i z e _ t j n o d e = 0; j n o d e <nb_nodes; + + jnode ) {

for (size_t j lev = 0; jlev <n b_ l ev e ls ; + + jlev ) {

// hv ( j no d e , j l ev ) = . ..

}

Listing 14: C++ Example of creation of a host array view

D1.3 – Development of Atlas, a ﬂexible data structure framework

- spec : ArraySpec

- data_store : ArrayDataStore

+ valid() : Boolean

+ isOnHost() : Boolean

+ isOnDevice() : Boolean

+ cloneToDevice()

+ cloneFromDevice()

+ syncHostDevice()

+ reactivateDeviceWriteViews()

+ reactivateHostWriteViews()

+ datatype() : DataType

+ size() : Integer

+ rank() : Integer

+ shape() : Integer[Rank]

+ strides() : Integer[Rank]

+ create( DataType, ArrayShape, ArrayLayout ) : Array

+ create<Value>( Ni, Nj, ... : Integer ) : Array

Array

+ size() : Integer

+ rank() : Integer

+ shape( dim : Integer) : Integer

+ valid() : Boolean

operator[] ( i, j, ... : Integer ) : Value

ArrayView Value, Rank

ArrayDataStore

NativeDataStore GridToolsDataStore

ArraySpec

make_view<Value,Rank>

make_host_view<Value,Rank>

make_device_view<Value,Rank>

Rank stands for number of

dimensions or number of indices. It

can be inferred from the arguments

passed to create

The operator[] takes only

as many indices as Rank

Figure 34: UML diagram for the Array class and its relation to the ArrayView

class

__glob a l _ _

void k er n el _e x (ArrayView <double , 2 > d v ,si ze _ t n b_ l ev el s )

{

for(s iz e _t j le v =0; jlev <n b_ le v el s ; + + jlev)

dv (t h re a dI d x .x, 3) = ...;

}

// C r e a t e a n A t l a s a r r a y

A rr ay *d s =A rr ay : : create <double >( nb_nodes ,n b _l e ve l s );

// S y n c h r o n i se the GPU d e v i c e c o p y o f t he a rray

ds - > c lo ne T oD ev ic e () ;

// C r e a t e a (2 D) vi e w t h a t can be u s ed from a GPU kerne l

auto dv =make_device_view <double , 2 > (* d s ) ;

// G P U k e r n e l c o m p u t a tion t h a t u s es t he a rray v iew

kernel <<<f u nc ti o ns p ac e .nb_nodes( ) ,1 > > >( dv ,nb _ le v el s );

cudaDeviceSynchronize( ) ;

// S y n c h r o n i se the CPU c o p y of the a r r a y

ds - > cloneFromDevice() ;

// c r e a t e a host vie w t o i n t e r p r e t t he A rray as a 2D st o r a g e o f d o u b l e s

auto hv =make_host_view <double , 2 > (* d s ) ;

fo r (s i z e _ t j n o d e = 0; j n o d e <nb_nodes; + + jnode ) {

for (size_t j lev = 0; jlev <n b_ l ev e ls ; + + jlev ) {

// c h e c k t h e v a l u e s c o m p u ted

// if ( hv ( j no d e , j l ev ) = = . .. ) . ..

}

Listing 15:

C++

Example of creation of a device array view and use of a GPU

kernel using CUDA 49

D1.3 – Development of Atlas, a ﬂexible data structure framework

Synchronisation protections

Once Atlas can create views of an array in multiple memory spaces (host and

GPU device), the computation can lead to invalid states of the array, if both

the host CPU the GPU views update their corresponding memory space without

synchronising them accordingly, as shown in the following example

// C r e a t e a n A t l a s a r r a y

A rr ay *d s =A rr ay : : create <double >( nb_nodes ,n b _l e ve l s );

// S y n c h r o n i se the GPU d e v i c e c o p y o f t he a rray

ds - > c lo ne T oD ev ic e () ;

// C r e a t e a view th a t c a n be us e d f r o m t h e CPU

auto hv =make_host_view <double , 2 > (* d s ) ;

// C r e a t e a vi e w t h a t c an be u s e d f rom a GPU kernel

auto dv =make_device_view <double , 2 > (* d s ) ;

// M o d i f y t h e h o s t v iew

fo r (s i z e _ t j n o d e = 0; j n o d e <nb_nodes; + + jnode ) {

for (size_t j lev = 0; jlev <n b_ l ev e ls ; + + jlev ) {

hv (jnode ,jlev ) = 0 ;

}

// M o d i f y t h e d e v i c e view using a CUDA kernel

kernel <<<nb_nodes ,1 >> >(dv ,n b _l e ve l s );

cudaDeviceSynchronize( ) ;

// At thi s p o i n t t h e two memory spaces ar e in a di f f e r e n t s t a te ,

// i n v alid a t i n g b o t h v i e w s

// - - > dv . v a li d ( ) = = f a ls e

// - - > dh . v a li d ( ) = = f a ls e

Listing 16:

C++

Example of creation of two views (host and device) that modify

concurrently their memory spaces leading to an inconsistent stage label

The current state of a view can be checked anytime with the valid method.

In order to support multiple views that can coexist in the same scope avoiding

invalid states, Atlas gives the possibility to create read only views, that will never

invalidate the state of other existing views, since they do not allow to modify their

data. Listing 17 shows an example of a valid use and coexistence of multiple views

by making use of read only views.

D1.3 – Development of Atlas, a ﬂexible data structure framework

// C r e a t e a n A t l a s a r r a y

A rr ay *d s =A rr ay : : create <double >( nb_nodes ,n b _l e ve l s );

// C r e a t e a re a d o n l y h o s t view

auto hv =make_host_view <double , 2 , true >(*d s ) ;

// C r e a t e a w rite d e v i c e v i e w

auto dv =make_device_view <double , 2 , false >(*ds ) ;

Listing 17:

C++

Example of creation and coexistence of multiple views by

creating Atlas read only views (last optional template parameter of the make_view

constructs)

5.3 Fortran ﬁelds on accelerators

One of the main functionalities of Atlas is the support of Fortran bindings so that

Fortran numerical operators can be implemented using the underlying Atlas data

structures for meshes and ﬁelds.

Therefore the GPU capable storages and API to manage the data has been forwarded

to the Fortran API of Atlas. In order to port Fortran numerical operators to

the GPU, one of the programming models employed by the ESCAPE project is

OpenACC.

OpenACC is a directive based approach to port Fortran operators to accelerators

that allows to retain the original implementation by using directives (comments in

the Fortran code) to instruct the OpenACC which loops should be parallelized in

the GPU.

In order to support the OpenACC development, Atlas connects the GPU allocated

pointers of the ﬁelds with the OpenACC gpu pointers. Listing 18 shows an example

of how to use the GPU storages of Atlas to implement an OpenACC kernel on the

GPU.

D1.3 – Development of Atlas, a ﬂexible data structure framework

type(atlas_Field) :: field1

type(atlas_Field) :: field2

real( 8 ) , pointer : : v 1 (:,:)

real( 8 ) , pointer : : v 2 (:,:)

field1 =atlas_Field(kind=a tl as _r eal (8 ) , s ha p e =[n,n])

field2 =atlas_Field(kind=a tl as _r eal (8 ) , s ha p e =[n,n])

call field1%c lo n e_ to _ de vi c e ()

call field2%c lo n e_ to _ de vi c e ()

call f ie ld %device_data (v 1 )

call f ie ld %h o st _ da t a (v2 )

! a cc d at a p r e se n t ( v 1 ) c op y i n ( v2 )

! $ ac c k e rn e ls

do j= 1 , n

do i= 1 , n

v1 (i,j) = v 2 (i,j) + 4 2.

endd o

! $ ac c e nd k er n el s

! $ ac c e nd d at a

Listing 18: Fortran Example of an OpenACC kernel operating on Atlas ﬁelds

6 Conclusions

The Atlas

C++

/ Fortran library provides ﬂexible data structures for both structured

and unstructured meshes and is intended to be applied in NWP or Climate modelling

codes.

One of the key data structure components used in a model is the Field that holds

the actual data of a ﬁeld variable. During the course of the ESCAPE project,

the Field concept has been extended to be GPU aware so that new algorithmic

approaches previously not possible can now be coded based on Atlas. One approach

is based on the GridTools Domain Speciﬁc Language, which is going to be applied

to several ESCAPE dwarfs. Thanks to the ESCAPE project, the use of Atlas ﬁelds

is now capable of being used easily in a “host-device” combination of heterogeneous

hardware. Including the expertise of partners using GPU’s and developing GPU

hardware at this early development stage of Atlas has been vital in developing a

future-proof data structure library.

The grid and mesh generation facilities in Atlas have been extended as part of

Deliverable D4.4 to include regional grids deﬁned in projected coordinates (

rather than geospherical coordinates (longitude,latitude). Thanks to the ESCAPE

D1.3 – Development of Atlas, a ﬂexible data structure framework

project, the Limited Area Modelling (LAM) community was involved at the early

development stage of Atlas. It would have proved a much more diﬃcult task to

redesign Atlas after the library would have matured without the involvement of

the LAM ESCAPE partners.

A new stable Atlas release for ESCAPE’s further dwarf developments is now

established with this deliverable, and has been tested and compiled with various

compilers and computer architectures.

7 References

[1]

ESCAPE Deliverable D1.1,

http : / / www . hpc - escape . eu / media - hub /

escape-pub/escape-deliverables, Accessed: 31-03-2017.

[2]

ESCAPE Deliverable D1.2: Batch 2: Deﬁnition of novel Weather & Climate

Dwarfs, provision of prototype implementations and dissemination to other

WPs,

http : / / www . hpc - escape . eu / media - hub / escape - pub / escape -

deliverables, Available: 31-12-2017.

[3]

ESCAPE Deliverable D4.4: Atlas extension for LAM use,

http://www.hpc-

escape . eu / media - hub / escape - pub / escape - deliverables

, Accessed:

31-03-2017.

[4]

ESCAPE Deliverable D2.4: Domain-speciﬁc language (DSL) for dynamical

cores on unstructured meshes/structured grids,

http://www.hpc- escape.

eu/media-hub/escape-pub/escape-deliverables, Available: 31-12-2017.

[5]

NVIDIA CUDA toolkit documentation,

http://docs.nvidia.com/cuda/

Accessed: 31-03-2017.

[6]

T. Gysi, C. Osuna, O. Fuhrer, M. Bianco, and T. C. Schulthess, “Stella:

A domain-speciﬁc tool for structured grid methods in weather and climate

models”, in Proceedings of the International Conference for High Performance

Computing, Networking, Storage and Analysis, ser. SC ’15, Austin, Texas:

ACM, 2015, 41:1–41:12, isbn: 978-1-4503-3723-6. doi:

10.1145/2807591.

2807627

. [Online]. Available:

http: // doi . acm .org / 10 . 1145 / 2807591.

2807627.

[7]

ESCAPE Deliverable D5.6: Establish software collaboration platform,

http:

// www.hpc- escape .eu/media- hub/escape- pub /escape- deliverables

Accessed: 31-03-2017.

[8]

Uniﬁed Modeling Language main webpage,

http://www.uml.org

, Accessed:

31-03-2017.

[9]

M. Hortal and A. Simmons, “Use of reduced Gaussian grids in spectral

models.”, Monthly Weather Review, vol. 119, pp. 1057–1074, 1991.

D1.3 – Development of Atlas, a ﬂexible data structure framework

[10]

S. Malardel, N. Wedi, W. Deconinck, M. Diamantakis, C. Kühnlein, G.

Mozdzynski, M. Hamrud, and P. Smolarkiewicz, “A new grid for the IFS”,

ECMWF Newsletter, vol. 146, pp. 23–28, 2016.

[11]

W. Deconinck, M. Hamrud, C. Kühnlein, G. Mozdzynski, P. Smolarkiewicz,

J. Szmelter, and N. Wedi, “Accelerating extreme-scale numerical weather

prediction”, in Parallel Processing and Applied Mathematics, Springer, 2016,

pp. 583–593.

[12]

P. Leopardi, “A partition of the unit sphere into regions of equal area and

small diameter”, Electronic Transactions on Numerical Analysis, vol. 25, no.

12, pp. 309–327, 2006.

[13]

G. Mozdzynski, “A new partitioning approach for ECMWF’s integrated

forecasting system (IFS)”, in Proceedings of the Twelfth ECMWF Workshop:

Use of High Performance Computing in Meteorology, vol. 273, World Scientiﬁc,

2007, pp. 148–166.

[14]

A. MacDonald, J. Middlecoﬀ, T. Henderson, and L. J.-L., “A general method

for modeling on irregular grids”, High Performance Computing Applications,

vol. 25, no. 1, pp. 392–403, 2011.

[15]

P. K. Smolarkiewicz, W. Deconinck, M. Hamrud, C. Kühnlein, G. Mozdzynski,

J. Szmelter, and N. P. Wedi, “A ﬁnite-volume module for simulating global

all-scale atmospheric ﬂows”, Journal of Computational Physics, vol. 314,

pp. 287–304, 2016.

D1.3 – Development of Atlas, a ﬂexible data structure framework

Document History

Version Author(s) Date Changes

0.1 W. Deconinck 9/01/17 Initial layout

0.2 W. Deconinck 24/02/17 Design concepts

0.3 W. Deconinck 05/03/17 Installation instructions

0.4 W. Deconinck 19/03/17 LAM contributions

0.5 C. Osuna 20/03/17 Accelerator support

0.6 W. Deconinck 21/03/17 Version for internal review

1.0 W. Deconinck 30/03/17 Final version

Internal Review History

Internal Reviewers Date Comments

Daan Degrauwe, Joris Van Bever 25/03/17 Approved with comments

Carlos Osuna 27/03/17 Approved with comments

Eﬀort Contributions per Partner

Partner Eﬀorts

ECMWF 6 pm

RMI 0.5 pm

MeteoSwiss 5.5m

Total 12m

ECMWF!Shinfield!Park!Reading!RG2!9AX!UK!

Contact:!peter.bauer@ecmwf.int!

The!statements!in!this!report!only!express!the!views!of!the!authors!and!the!European!Commission!

is!not!responsible!for!any!use!that!may!be!made!of!the!information!it!contains.!

ResearchGate has not been able to resolve any citations for this publication.

A new grid for the IFS

Article

Full-text available

Jan 2016

A partition of the unit sphere into regions of equal area and small diameter

Article

Full-text available

Jan 2006
ELECTRON T NUMER ANA

Paul Charles Leopardi

The recursive zonal equal area (EQ) sphere partitioning algorithm is a practical algorithm for partitioning higher dimensional spheres into regions of equal area and small diameter. This paper describes the partition algorithm and its implementation in Matlab, provides numerical results and gives a sketch of the proof of the bounds on the diameter of regions. A companion paper [13] gives details of the proof.

STELLA: a domain-specific tool for structured grid methods in weather and climate models

Conference Paper

Nov 2015

Many high-performance computing applications solving partial differential equations (PDEs) can be attributed to the class of kernels using stencils on structured grids. Due to the disparity between floating point operation throughput and main memory bandwidth these codes typically achieve only a low fraction of peak performance. Unfortunately, stencil computation optimization techniques are often hardware dependent and lead to a significant increase in code complexity. We present a domain-specific tool, STELLA, which eases the burden of the application developer by separating the architecture dependent implementation strategy from the user-code and is targeted at multi- and manycore processors. On the example of a numerical weather prediction and regional climate model (COSMO) we demonstrate the usefulness of STELLA for a real-world production code. The dynamical core based on STELLA achieves a speedup factor of 1.8x (CPU) and 5.8x (GPU) with respect to the legacy code while reducing the complexity of the user code.

Accelerating Extreme-Scale Numerical Weather Prediction

Chapter

Apr 2016

Numerical Weather Prediction (NWP) and climate simulations have been intimately connected with progress in supercomputing since the first numerical forecast was made about 65 years ago. The biggest challenge to state-of-the-art computational NWP arises today from its own software productivity shortfall. The application software at the heart of most NWP services is ill-equipped to efficiently adapt to the rapidly evolving heterogeneous hardware provided by the supercomput-ing industry. If this challenge is not addressed it will have dramatic negative consequences for weather and climate prediction and associated services. This article introduces Atlas, a flexible data structure framework developed at the European Centre for Medium-Range Weather Forecasts (ECMWF) to facilitate a variety of numerical discretisation schemes on heterogeneous architectures, as a necessary step towards affordable ex-ascale high-performance simulations of weather and climate. A newly developed hybrid MPI-OpenMP finite volume module built upon Atlas serves as a first demonstration of the parallel performance that can be achieved using Atlas' initial capabilities.

A finite-volume module for simulating global all-scale atmospheric flows

Article

Mar 2016

The paper documents the development of a global nonhydrostatic finite-volume module designed to enhance an established spectral-transform based numerical weather prediction (NWP) model. The module adheres to NWP standards, with formulation of the governing equations based on the classical meteorological latitude-longitude spherical framework. In the horizontal, a bespoke unstructured mesh with finite-volumes built about the reduced Gaussian grid of the existing NWP model circumvents the notorious stiffness in the polar regions of the spherical framework. All dependent variables are co-located, accommodating both spectral-transform and grid-point solutions at the same physical locations. In the vertical, a uniform finite-difference discretisation facilitates the solution of intricate elliptic problems in thin spherical shells, while the pliancy of the physical vertical coordinate is delegated to generalised continuous transformations between computational and physical space. The newly developed module assumes the compressible Euler equations as default, but includes reduced soundproof PDEs as an option. Furthermore, it employs semi-implicit forward-in-time integrators of the governing PDE systems, akin to but more general than those used in the NWP model. The module shares the equal regions parallelisation scheme with the NWP model, with multiple layers of parallelism hybridising MPI tasks and OpenMP threads. The efficacy of the developed nonhydrostatic module is illustrated with benchmarks of idealised global weather.

A NEW PARTITIONING APPROACH FOR ECMWF'S INTEGRATED FORECASTING SYSTEM (IFS)

Conference Paper

Oct 2007

George Mozdzynski

Since the mid-90s IFS has used a 2-dimensional scheme for partitioning grid point space to MPI tasks. While this scheme has served ECMWF well there has nevertheless been some areas of concern, namely, communication overheads for IFS reduced grids at the poles to support the Semi-Lagrangian scheme; and the halo requirements needed to support the interpolation of fields between model and radiation grids. These issues have been addressed by the implementation of a new partitioning scheme called EQ_REGIONS which is characterised by an increasing number of partitions in bands from the poles to the equator. The number of bands and the number of partitions in each particular band are derived so as to provide partitions of equal area and small 'diameter'. The EQ_REGIONS algorithm used in IFS is based on the work of Paul Leopardi, School of Mathematics, University of New South Wales, Sydney, Australia.

Use of Reduced Gaussian Grids in Spectral Models

Article

Apr 1991

Integrations of spectral models are presented in which the "Gaussian' grid of points at which the nonlinear terms are evaluated is reduced as the poles are approached. A maximum saving in excess of one-third the number of points covering the globe is obtained by requiring that the grid length in the zonal direction does not exceed the grid length at the equator, and that the number of points around a latitude circle enables the use of a fast Fourier transform. The results show that such a reduced grid can be used for short- and medium-range prediction (and presumably also for climate studies) with no significant loss of accuracy compared with use of a conventional grid, which is uniform in longitude. The saving in computational time is between 20% and 25% for the T106 forecast model. -from Authors

A general method for modeling on irregulargrids

Article

Nov 2011
INT J HIGH PERFORM C

For simulation on a spherical surface, such as global numerical weather prediction, icosahedral grids are superior to their competitors in uniformity of grid mesh distance across the entire globe and lack of neighboring grid cells that share only a single vertex. Use of such a grid presents unique programming challenges related to iteration across grid cells and location of neighboring cells. Here we describe an icosahedral grid with a one-dimensional vector loop structure, table specified memory order, and an indirect addressing scheme that yields very compact code despite the complexities of this grid. This approach allows the same model code to be used for many grid structures. Indirect addressing also allows grid cells to be stored in any order, selectable at run time. This permits easy implementation of different memory layouts for cache blocking, distributed-memory parallelism, and static load balancing. Since indirect addressing can adversely affect execution time we organize arrays to place a directly addressable index innermost. We also describe experiments designed to measure any performance penalties accrued from use of indirect addressing.

Development of Atlas, a flexible data structure framework

Abstract and Figures

Recommended publications

Atlas : A library for numerical weather prediction and climate modelling

Accelerating Extreme-Scale Numerical Weather Prediction

Public release of Atlas under an open source license, which is accelerator enabled and has improved...

Challenges of applying an embedded domain specific language for performance portability to Earth sys...