SIParCS 2024 - Rachel Tam
Scaling UXarray: Bridging the Gap for High-Performance Unstructured Grid Analysis through Dask and Documentation Enhancements
Global climate models (GCMs) now operate with unstructured dynamical cores that permit explicitly resolving nonhydrostatic motions and regional resolution refinement. Models that run at storm-resolving resolutions generate large volumes of unstructured grid data that needs to be analyzed at scale. These efforts have introduced the need for scalable analysis algorithms, which have been commonplace for structured grids thanks to packages such as Xarray and Dask. UXarray is an open-source Python package that enables the analysis and visualization of such datasets without the need of regridding to structured grids.
The state of UXarray’s existing implementation of analysis routines has been assessed, which have been primarily written using Numpy. In this internship, Dask was heavily leveraged to enable UXarray to be viable for the analysis of the vast volumes of unstructured GCM output, as demonstrated in the revision of the existing topological aggregation routines and the implementation of a new weighted-averaging routine. Benchmarks on high-resolution unstructured grids at scale were performed to quantify the performance of UXarray.
Additional documentation has been done for the additional routines and the package in general, including workflow examples with output from the Department of Energy (DOE) Energy Exascale Earth System Model v2 (E3SM-v2), and optimal performance showcases through pairing UXarray with Dask.
Mentors: Philip Chmieloweic, Orhan Eroglu
Slides and poster