SIParCS 2022 - Haniye Kashgarani
Exploring performance of GeoCAT data analysis routines on GPUs
The GeoCAT-comp program is a Python toolkit used by the geoscience community to analyze data. This project explores ways to port GeoCAT-comp to run on GPUs, as recent supercomputers are shifting to include GPU accelerators as the major resource. Although GeoCAT-comp's routines are all sequential or utilize Dask parallelization on the CPU, the data processing is embarrassingly parallel and computationally costly, enabling us to optimize using GPUs. GeoCAT uses NumPy, Xarray, and Dask arrays for CPU parallelization. In this project, we examined different GPU-accelerated Python packages (e.g., Numba and CuPy). Taking into account the deliverability of the final porting method to the GeoCAT team, CuPy is selected. CuPy is a Python CUDA-enabled array backend module that is quite similar to NumPy. We analyzed the performance of the GPU-accelerated code compared to the Dask CPU parallelized code over various array sizes and resources, and through strong and weak scaling.
Mentors: Cena Miller, Supreeth Suresh, Anissa Zacharias
Slides and poster