SIParCS 2024 - Kamil Yousuf
Developing and Evaluating Methods for Distributing Observations in DART using MPI
The Data Assimilation Research Testbed (DART) uses real-world observations and state assimilation to forecast weather and other Earth system processes. To improve DART’s performance, the assimilation is run in parallel across many processes and nodes using the Derecho supercomputer. One component which has not been parallelized, however, is observation memory; each process holds a copy of the real-world observation sequence. Currently, this has served DART well; however, as technology advances and observations are taken more frequently, the observation sequence continues to increase in size. Furthermore, many AI models, such as Pangu Weather, require millions (possibly billions) of observations across many time windows to be quickly accessible. The observation sequence is currently limited to available per-process memory, which severely constrains its size. This project seeks to address this by distributing the observation sequence across the processes used to conduct the assimilation. We developed and evaluated two models which can ensure that each process has quick and easy access to the observations it needs to 1) calculate its assigned forward operators and 2) assimilate its assigned state. The first model uses MPI’s collective operations to provide each process with a subset of the observation sequence sorted by time; the second model uses MPI’s one-sided operations to allow processes easy access to other processes’ memory. Both models have their benefits and drawbacks, which we evaluate with several benchmarks testing the models’ CPU and memory performance.
Mentors: Helen Kershaw, Marlee Smith
Slides and poster