Media hub‎ > ‎ESCAPE-2 News‎ > ‎

New Report on local data recovery approaches suitable for weather and climate prediction

posted 25 Mar 2020, 02:07 by Daniel Thiemert   [ updated 25 Mar 2020, 02:12 ]

Numerical weather and climate prediction rates as one of the scientific applications whose accuracy improvements greatly depend on the growth of the available computing power. As the number of cores in top computing facilities pushes into the millions, increasing average frequency of hardware and software failures forces users to review their algorithms and systems in order to protect simulations from breakdown. 

A new ESCAPE-2 report surveys approaches for fault-tolerance in numerical algorithms and system resilience in parallel simulations from the perspective of numerical weather and climate prediction systems.

A selection of existing strategies is analysed, featuring interpolation-restart and compressed check-pointing for the numerics, in-memory check-pointing, user-level failure mitigation-based and backup-based methods for the systems. Numerical examples showcase the performance of the techniques in addressing faults, with particular emphasis on iterative solvers for linear systems, a staple of atmospheric fluid flow solvers.

The potential impact of these strategies is discussed in relation to current development of numerical weather prediction algorithms and systems towards the exa-scale. Trade-offs between performance, efficiency and effectiveness of resiliency strategies are analysed and some recommendations outlined for future developments.