SIParCS 2019 - Bailey Kleespies
Deploying File System Performance Metrics Through XSEDE Metrics on Demand (XDMoD)
At NCAR, supercomputing resources generate and process masses of climate data every day. In order for NCAR to continue to maintain these resources at a high level, it is important to be able to report on and manage the file systems being processed. The storage resources available are expensive and limited. Because of this, it is important for management to be able to see the usage of file systems on NCAR’s supercomputer, Cheyenne, in order to make strategic decisions. This project looks at deploying a new system to help the Computational & Information Systems Lab (CISL) monitor CPU and memory performance at the job level. XSEDE (XD) Metrics on Demand (XDMoD) is a high-performance computing (HPC) tool that allows scientists to access standard metrics about their HPC resources like utilization, quality of service, and job level performance. In this project, we look at making adjustments to the installation of XDMoD on NCAR systems to fix issues from past installations that prevented useful reports from being generated. Through the proper installation of XDMoD and SUPReMM (Job Performance Data) onto NCAR’s systems, an accurate weekly report will be available to send out regarding the performance of Cheyenne.
Mentor: Shiquan Su
Slides