Cloud-native repositories for big scientific data | Computational and Information Systems Lab

Cloud-native repositories for big scientific data

Abernathey, R. P., Augspurger, T., Banihirwe, A., Blackmon-Luca, C. C., Crone, T. J., et al. (2021). Cloud-native repositories for big scientific data. Computing in Science & Engineering, doi:https://doi.org/10.1109/MCSE.2021.3059437

Title	Cloud-native repositories for big scientific data
Genre	Article
Author(s)	R. P. Abernathey, T. Augspurger, Anderson Banihirwe, C. C. Blackmon-Luca, T. J. Crone, C. L. Gentemann, Joseph J. Hamman, N. Henderson, C. Lepore, T. A. McCaie, N. H. Robinson, R. P. Signell
Abstract	Scientific data have traditionally been distributed via downloads from data server to local computer. This way of working suffers from limitations as scientific datasets grow toward the petabyte scale. A "cloud-native data repository," as defined in this article, offers several advantages over traditional data repositories—performance, reliability, cost-effectiveness, collaboration, reproducibility, creativity, downstream impacts, and access and inclusion. These objectives motivate a set of best practices for cloud-native data repositories: analysis-ready data, cloud-optimized (ARCO) formats, and loose coupling with data-proximate computing. The Pangeo Project has developed a prototype implementation of these principles by using open-source scientific Python tools. By providing an ARCO data catalog together with on-demand, scalable distributed computing, Pangeo enables users to process big data at rates exceeding 10 GB/s. Several challenges must be resolved in order to realize cloud computing's full potential for scientific research, such as organizing funding, training users, and enforcing data privacy requirements.
Publication Title	Computing in Science & Engineering
Publication Date	Mar 1, 2021
Publisher's Version of Record	https://doi.org/10.1109/MCSE.2021.3059437
OpenSky Citable URL	https://n2t.org/ark:/85065/d7q52t1h
OpenSky Listing	View on OpenSky
CISL Affiliations	TDD, IOWA