SIParCS 2021 - Bo Zhang
Evaluation of DataSpaces in Heterogeneous In-situ workflow for GPU-MURaM at Exascale
With most of the next generation exascale HPC clusters moving towards accelerators, computational-intensive scientific applications are migrated to these accelerators for performance benefits. However, as the computation is optimized, higher I/O throughput is required for data movement between coupled heterogeneous applications. Moreover, current coupling methods for such a heterogeneous in-situ workflow are not efficient due to changes in resource utilization as both computation and storage workload are moved to devices. To address these challenges, this SIParCS project’s aim is to explore integrating DataSpaces, an in-situ I/O coupling system, with the GPU-version of MPS/University of Chicago Radiative MHD (MURaM) simulation. We tested two approaches using DataSpaces; first, using remote data staging with GPUDirect over OpenFabric and secondly, using a local or in-node staging approach for heterogeneous adaption. The remote staging with GPUDirect using OpenFabric approach avoids inessential data movement to host memory, while the local staging approach efficiently uses the idle host resources as the computation workload is migrated to device. Our preliminary evaluation with a synthetic benchmark emulating MURaM indicates that GPUDirect remote staging approach is 2x slower than baseline, whild local staging gains a 1.27x speedup.
Mentors: Cena Miller & Supreeth Suresh
Slides and poster