Media hub‎ > ‎


New Report on local data recovery approaches suitable for weather and climate prediction

posted 25 Mar 2020, 02:07 by Daniel Thiemert   [ updated 25 Mar 2020, 02:12 ]

Numerical weather and climate prediction rates as one of the scientific applications whose accuracy improvements greatly depend on the growth of the available computing power. As the number of cores in top computing facilities pushes into the millions, increasing average frequency of hardware and software failures forces users to review their algorithms and systems in order to protect simulations from breakdown. 

A new ESCAPE-2 report surveys approaches for fault-tolerance in numerical algorithms and system resilience in parallel simulations from the perspective of numerical weather and climate prediction systems.

A selection of existing strategies is analysed, featuring interpolation-restart and compressed check-pointing for the numerics, in-memory check-pointing, user-level failure mitigation-based and backup-based methods for the systems. Numerical examples showcase the performance of the techniques in addressing faults, with particular emphasis on iterative solvers for linear systems, a staple of atmospheric fluid flow solvers.

The potential impact of these strategies is discussed in relation to current development of numerical weather prediction algorithms and systems towards the exa-scale. Trade-offs between performance, efficiency and effectiveness of resiliency strategies are analysed and some recommendations outlined for future developments.

Successful 1st Dissemination Workshop

posted 30 Oct 2019, 01:42 by Daniel Thiemert   [ updated 30 Oct 2019, 01:42 ]

On 21-22 October, the first of two dissemination workshops sharing the ESCAPE-2 progress with the wider weather, climate and computing community was held at ECMWF in Reading. Over 60 participants joined the meeting from Europe and the United States, including academia as well as industry (hardware vendors).

Day 1 of the workshop was organised along the four main technical work packages. On day 2 this was supplemented by selected presentations from related projects (also EC funded) and working group sessions to discuss results and plan the work for the upcoming months.

One of the main success stories so far is the increasing maturity and visibility of the dwarf concept that allows specific technical developments on code components that represent generic functionalities in weather and climate models and that come with notorious computational bottlenecks. This concept has introduced a flexible and agile code optimisation framework for work on both alternative numerical and algorithmic methodologies, and hardware adaptation and optimisation. The workshop showed how well this worked in the past and that this concept is already being embraced by other groups, in particular hardware developers for testing early version of new technologies.

One of the objectives of ESCAPE-2 is to advance the dwarfs towards so-called High-Performance Climate and Weather (HPCW) benchmarks that deliver a more representative performance estimate of real-life applications on the available architectures. The workshop discussions confirmed the need for such benchmarks, but also exhibited several open questions such as licensing, degree of complexity permitted, and whether advanced concepts like domain-specific languages should be incorporated.

The other main achievement is the progress on domain-specific languages (DSL) whose main components like front-ends, intermediate representations and hardware-specific back-ends are already being tested. The workshop further confirmed the need for this development within the community and that vendors will support its assessment on their platforms. The discussions also raised hope that a full implementation and test at scale could be performed by the time the pre-exascale EuroHPC machines will be made available.

The VVUQ work of ESCAPE-2 has demonstrated the first ingestion of simple weather models in the URANIE framework and will take in models with enhanced complexity up to a full prediction system in the remainder of the project. The workshop discussions concluded that the estimation of sensitive but uncertain parameters is a key asset of the tool, and that this feature will be explored with added focus in the next phase. 

The workshop concluded with statements from vendors (IBM, Intel, NVIDIA) on the project's achievements and progress. All vendors offered strong in-kind support and were very keen to continue the collaboration on dwarf optimisation and the HPCW benchmarks. Once established and publicly available, these benchmarks are likely to create significant impact, both in terms of general HPC infrastructure performance assessments and as the baseline for compiler development and application tuning.

The proceedings and presentations are available on the workshop page.

New deliverables published

posted 8 Oct 2019, 01:13 by Daniel Thiemert   [ updated 8 Oct 2019, 01:14 ]

The deliverable D2.1 presented a comprehensive set of computational patterns derived from dwarfs and representative computations of the models participating in the ESCAPE-2 project. Each computational pattern derived a set of language elements that would be necessary to express that type of computations in a way that language elements are minimal, orthogonal, simple and non redundant. D2.3 High-level intermediate (HIR) representation specification now provides a full and formal specification of the high-level intermediate representation (HIR) for weather and climate applications. A full specification is presented as a set of concepts following the language elements identified in D2.1. They are organized as a tree to represent a full program. Each concept captures a specific information of a the computational patterns supported by the DSL toolchain. The semantic of each node is described as well as its properties and children nodes that it supports. The HIR serves as an interface between a language frontend (D2.2) and the implementation of a compiler toolchain that generates an efficient, parallel and optimized implementation of a model described using a concise and compact frontend language. 

In D3.2 Assessment of the evaluation and verification tools based on the v0-benchmark configurations, a methodology is developed to assess the correctness of weather and climate benchmark models. The methodology is able to distinguish small deviations, typically due to round-off errors, a change in hardware environment, or a change in the software toolchain, from larger deviations, such as due to a software bug. Moreover, the methodology can be applied to the high-dimensional output that is generated by climate and weather models.
As such, this methodology will assist in porting climate and weather models to new hardware platforms. It will also provide help when comparing different algorithmic choices.

Both deliverables are available for download, together with previous public deliverables.

1st Dissemination Workshop - Programme published

posted 24 Sept 2019, 01:24 by Daniel Thiemert   [ updated 24 Sept 2019, 01:24 ]

The programme for the 1st ESCAPE-2 Dissemination Workshop has been published at

1st ESCAPE-2 Dissemination Workshop

posted 2 Jul 2019, 03:06 by Daniel Thiemert   [ updated 5 Sept 2019, 01:14 ]

ESCAPE-2 is organising its first Dissemination Workshop in Reading, UK, between the 21st and 22nd of October.

The workshop will be organised according to the following topics:

  • Algorithms and Mathematics
  • Programming Models and Domain Specific Languages
  • Weather and Climate Benchmarks
  • Verification, Validation and Uncertainty Quantification
  • Links with other efforts in the community
Details (venue, registration, etc.) can be found on the workshop website.

First Definition of a common and generic framework for VVUQ assesments

posted 27 Jun 2019, 06:18 by Daniel Thiemert   [ updated 27 Jun 2019, 06:18 ]

One of main objectives of the ESCAPE-2 project is to provide a Validation, Verification and Uncertainty Quantification (VVUQ) package based on the URANIE platform (an open-source VVUQ toolbox) for weather and climate simulations. In this context, the deliverable D4.1 has been published that presents the first essential step which is the definition of VVUQ framework between the two communities: 

  • weather and climate modeling on one side, and 
  • URANIE community on the other side, which historically covers energy, defense and security. 
This is essential to allow for an efficient communication between the two communities and to identify early possible synergies in the ESCAPE2 project. 

After introducing some URANIE concepts (for example the fact that it uses a non-intrusive approach and sees numerical simulation as a black box), the notions of Verification and Validation are discussed to converge towards the same definition between both communities: 

  • The verification is the step of controlling the capability of the numerical model to solve the mathematical equations of the model. It solely focuses on numerical aspects. 
  • The validation represents the comparison between reality (represented by experiments), and the numerical model. 

Both URANIE and weather/climate communities deal with uncertainties in numerical simulations. While, the whole set of error sources is represented within the Ensemble Prediction System for Numerical Weather Prediction, URANIE is only considering random perturbations in model parameters, also called physical parameters, for applying non-intrusive Uncertainty Quantification (UQ) technics. Both communities agree that making a distinction between the various sources of uncertainties (model parameters, initial conditions, etc.) is a key point for the following of the project.

Successful VVUQ Workshop

posted 30 Apr 2019, 07:10 by Daniel Thiemert   [ updated 30 Apr 2019, 07:11 ]

The VVUQ workshop (part of the WP4 of the project) took place on April, the 23rd and 24th in Saint-Rémy-les-Chevreuse, near CEA Saclay.
Extensive descriptions of the VVUQ methodologies as performed on both the URANIE and NWP communities were presented, allowing the various members of the project to become more familiar with both approaches.

Thanks to the presentation of W. Edeling, the link was also made with the VECMA H2020 project where UQ analysis plays a critical role.

The workshop hence allowed to precise:
  1. The common notions that everyone should (physical parameters vs initial conditions, verification vs validation, etc...). Those concepts will form the backbone of the deliverable to come.
  2. The work that will be performed in the rest of the project, notably the fact that the focus will mainly be on the UQ analysis of physical parameters.
  3. In-depth the functionalities of the URANIE tool, as well as first results on how to apply it for NWP analysis.
Finally, in parallel of the workshop BSC and CEA discussed about technical aspects of the current status of the URANIE code and how to improve the workflow internals and specific I/O management issues.

The main outcome of the workshop will be deliverable D4.1 published in June.

Presentations are available on the workshop page.

Definition of a Domain Specific Language for weather and climate prediction

posted 12 Apr 2019, 04:29 by Daniel Thiemert   [ updated 12 Apr 2019, 04:31 ]

DSL Tool Chain
One of the aims of WP2 of ESCAPE-2 is to define, develop and apply a domain-specific language (DSL) toolchain applicable to a comprehensive list of algorithmic motifs (dwarfs) in weather and climate prediction. Domain specific languages are powerful tools that provide programming environments that allow to write numerical scientific algorithms in a concise and high-level language.

The weather and climate domain is characterized by very specific algorithmic motifs derived from the discretization of the numerical methods employed in the mathematical models, the specific aspect ratio of horizontal to vertical grids in regional and global models, and the use of sub-gridscale parametrization characterized by different algorithmic patterns.

This motivates the development of a description suitable for these specific domain characteristics, using a highly concise and readable language. Details such as explicit loops, ordering of the loop nest, data layout or optimizations such as tiling are hardware specific optimizations that are abstracted away from such a high-level language.

Among other things, the DSL language is abstracting away all the details of an efficient parallel implementation and the hardware dependent programming models and optimizations. There are several examples of DSLs being developed and applied to production weather and climate models, like COSMO GridTools (Gysiet al, 2015), the PSyclone for the LFRic model (Adams et al) or the CLAW DSL for column based parameterizations (Clement et al).

In contrast to the existing approaches, that are normally specifically developed for a particular model, the ESCAPE-2 DSL aims at developing a modular toolchain, that supports a wide range of models, numerical methods and grids, by adopting a modular design where domain specific frontends or optimizers can be easily incorporated into the toolchain. Additionally, most of the existing approaches provide a prescriptive language, where the user still has to provide information crucial for parallelization of the algorithm and to obtain good performance. Instead the goal of this document is to provide a high-level descriptive language where the algorithms are described in a sequential manner. The parallelization and optimization implementations are derived by the set of optimizers incorporated in the toolchain.

The recently finalized deliverable D2.1 "High-level domain-specific language (DSL) specification" presents a first definition of a high-level DSL language that is capable of supporting most of the computational patterns present in the models that participate in the ESCAPE-2 project. The document is the outcome of an iterative process and discussions between DSL experts and key model developers. It delivers an extensive and comprehensive grammar, from a holistic perspective where the aim is to provide a unified standard language and HIR (High-level intermediate representation) definition that can deal with a broad spectrum of methods and models. The proposal is a change of paradigm and current state of the art of abstractions and DSLs, where the established tools support only a restricted set of methods or models.

The grammar defined in this document will be the basis of a DSL frontend (D2.2, September 2019) and toolchain implementation (D2.5 May 2021) that will be demonstrated in a set of dwarfs with representative computations from the different models that participate in the project (D2.4 September 2020).

Workshop on VVUQ

posted 14 Feb 2019, 08:01 by Daniel Thiemert   [ updated 14 Feb 2019, 08:02 ]

The ESCAPE-2 project will organise a workshop on VVUQ from 23rd to 24th of April in France. This workshop will serve as a basis for the definition of a common VVUQ (Verification Validation and Uncertainty Quantification) framework as to be applied in Numerical Weather Prediction. Both URANIE and NWP communities need to agree on a common language and framework to be able to apply VVUQ analysis on the various dwarfs and codes at hand in the ESCAPE-2 project. It is part of the deliverables of WP4.

This workshop will also serve as an exchange basis for the coming tasks of WP4 concerning the handling of high dimensionality. The first discussions on how this should be handled and what are the potential approaches will take place. More here.

Successful Workshop Fault tolerant algorithms and resiliency approaches

posted 28 Jan 2019, 04:10 by Daniel Thiemert   [ updated 28 Jan 2019, 04:11 ]

ESCAPE-2 organised a workshop oh fault tolerant algorithms and resiliency approaches on the 23rd and 24th of January 2019 in Milan, Italy. The workshop consisted of a first day of seminars by experts in systems resilience and fault-tolerant numerical algorithms and a second day of scientific discussions of the same experts with project participants. The presentations gave a detailed picture of the state of the art in the field and established connections with operational workflows and numerical algorithms used in atmospheric applications. 

During the discussion sessions, participants explored more in detail how to complement existing numerical weather and climate prediction models with resilience and fault-tolerance techniques. Specific recommendations included benchmarking NWP data volume and operational requirements, pairing fault-tolerant algorithms with system resilience in consistent workflows, coordinating with vendors to provide detailed hardware fault information, and embedding fault-tolerance in domain-specific language programming paradigms. 

The conclusions of the workshop will feature in a white paper to be submitted as an ESCAPE-2 project deliverable, and will inform the investigation of hardware and software resiliency tools within existing and future ESCAPE-2 project dwarfs. Presentations can be found on the workshop page.

1-10 of 11