Accelerating CMIP data analysis with parallel computing in R

Milroy, D., Chen, S., Vanderwende, B., Hammerling, D.. (2017). Accelerating CMIP data analysis with parallel computing in R. , doi:https://doi.org/10.5065/D61V5CP8

Title Accelerating CMIP data analysis with parallel computing in R
Genre Technical Report
Author(s) Daniel Milroy, Sophia Chen, Brian Vanderwende, Dorit Hammerling
Abstract In this Technical Note we examine eight schemes for parallelizing Extreme Value Analysis (EVA) on Coupled Model Intercomparison Project data via R foreach, doParallel, and doMPI packages. We perform strong scaling studies to delineate the performance impacts of factors such as R cluster type (TCP/IP sockets and MPI), communication protocol (Ethernet, IP over InfiniBand, and MPI), loop parallelization (outer or inner loop), and approaches to reading data from the NCAR GLADE parallel filesystem. We elucidate peculiarities of R memory management and overhead associated with interprocess communication and discuss broadcast limitations of Rmpi. The best performing scheme parallelizes the outer EVA loop across latitude and reads only the subset of the data operated on in the inner loop over longitude; the different cluster types and communication protocols all perform about equally for this scheme. This configuration represents a parallel speedup of 50 with 96 R workers, and is scalable for EVA on larger problem sizes than those presented here.
Publication Title
Publication Date Jun 30, 2017
Publisher's Version of Record https://doi.org/10.5065/D61V5CP8
OpenSky Citable URL https://n2t.org/ark:/85065/d7z89fpb
OpenSky Listing View on OpenSky
CISL Affiliations TDD, SDSS, OSD, USS, CSG

Back to our listing of publications.