SIParCS 2019 - Weile Wei
Using Cloud-Friendly Data Format in Earth System Models
Data volume of the earth system model is increasing rapidly every year, thanks to advances in computing efficiency and increases in model resolution and complexity. To efficiently produce these data volumes, parallel Input/Output (I/O) is a necessity, and to help manage data volumes, compression is also a necessity. Unfortunately, the commonly used data format Network Common Data Form (NetCDF) in earth system model does not support parallel I/O with compression without significant performance loss. Alternatively, Zarr is a cloud-friendly data format that provides an implementation of parallel (multiple threads or processes) I/O for chunked, compressed, N-dimensional arrays. In this poster, we present our work on adding Zarr/Z5 data format in the Community Earth System Model (CESM) by integrating it into the backend ParallelIO 2 library. We tested the I/O performance of Z5-supported CESM on National Center for Atmosphere Research's Cheyenne supercomputer. Our work enriches Zarr/Z5 development ecosystem, provides flexibility in choosing I/O backends in CESM, and prepares CESM in the settings of object-based storage system in cloud computing services.
Slides
Mentors: Haiying Xu, John Dennis, Kevin Paul