SIParCS 2017- Yun Joon Soh
Optimizing the Statistical Compression Method
Increasing computational performance produces ever-larger data outputs, which result in data storage constraints. This is especially problematic in Earth system modeling, where resulting spatial-temporal data files can be massive. A variety of deterministic compression methods have been tested to reduce storage demands. Recently, statistical methods have also been proposed to address this issue. One such method is based on half-spectral space-time models (Guinness and Hammerling, 2017), where grid-point-wise time series of climate data are expressed in the spectral domain using Fourier coefficients. When mapped at each frequency, the Fourier coefficients exhibit spatial continuity that can be exploited to store only a subset of the Fourier coefficients and model the rest. Finding the optimal set of Fourier coefficients to store is an intractable problem for any realistic data size, and the approach taken is a greedy search algorithm based on residual metrics. We explored various options to improve the greedy search algorithm, by testing different residual metrics to improve the quality of the reconstructed fields, and by parallelizing and speeding up the algorithm. We demonstrate results of these improvements on a data set of one year of daily temperature data, comprised of approximately 20 million data points. We can achieve improvements in root mean squared prediction error of up to 6.5 percent and lower the runtime by a factor of 2.
Mentors: Dorit Hammerling, Brian Vanderwende, Doug Nychka