SIParCS 2017- Marcin Jurek
Parallel Implementation of the Multiresolution Approximation Method
Massive spatial data sets are ubiquitous these days in atmospheric and Earth sciences. Traditionally, statisticians have used Gaussian processes to model such data. However these models require the calculation of the inverse of the covariance matrix, which is not feasible for many data sets of interest due to their size.
One way of overcoming this problem is offered by multiresolution methods. One of these methods is the multiresolution approximation (Katzfuss 2017). This approach is based on representing the original spatial process using the first few terms of its Karhunen-Loeve expansion, calculated using the Nystrom method. The remainder is then spatially partitioned. Assuming independence over partitions it is then approximated again using the Nystrom method. This procedure is repeated until the desired level of detail is obtained. This setup leads to efficient computation that can be used to process millions of data points.
The talk will then discuss the new Python implementation, developed to a large extent during the internship, that implements a parallel algorithm for calculating the multiresolution approximation for any Gaussian model. A demonstration of the predictions obtained using the implementation will follow as well as some experiment results showing its runtime and memory footprint on various data sets on the Cheyenne supercomputer. A comparison with an existing in-house Matlab implementation will also be included.
Mentors: Dorit Hammerling, Brian Vanderwende, Doug Nychka