Accelerating CMIP data analysis with parallel computing in R
Milroy, D., Chen, S., Vanderwende, B., Hammerling, D.. (2017). Accelerating CMIP data analysis with parallel computing in R. , doi:https://doi.org/10.5065/D61V5CP8
Title | Accelerating CMIP data analysis with parallel computing in R |
---|---|
Genre | Technical Report |
Author(s) | Daniel Milroy, Sophia Chen, Brian Vanderwende, Dorit Hammerling |
Abstract | In this Technical Note we examine eight schemes for parallelizing Extreme Value Analysis (EVA) on Coupled Model Intercomparison Project data via R foreach, doParallel, and doMPI packages. We perform strong scaling studies to delineate the performance impacts of factors such as R cluster type (TCP/IP sockets and MPI), communication protocol (Ethernet, IP over InfiniBand, and MPI), loop parallelization (outer or inner loop), and approaches to reading data from the NCAR GLADE parallel filesystem. We elucidate peculiarities of R memory management and overhead associated with interprocess communication and discuss broadcast limitations of Rmpi. The best performing scheme parallelizes the outer EVA loop across latitude and reads only the subset of the data operated on in the inner loop over longitude; the different cluster types and communication protocols all perform about equally for this scheme. This configuration represents a parallel speedup of 50 with 96 R workers, and is scalable for EVA on larger problem sizes than those presented here. |
Publication Title | |
Publication Date | Jun 30, 2017 |
Publisher's Version of Record | https://doi.org/10.5065/D61V5CP8 |
OpenSky Citable URL | https://n2t.org/ark:/85065/d7z89fpb |
OpenSky Listing | View on OpenSky |
CISL Affiliations | TDD, SDSS, OSD, USS, CSG |