CISL Annual Report: FY2023
In fiscal year 2023 (FY23), the NSF NCAR Computational & Information Systems Lab (CISL) achieved numerous meaningful landmarks. One of the most prominent highlights was the smooth launch of the long-awaited Derecho supercomputing system. With its extensive graphic processing unit (GPU)-computing capabilities, the Derecho system showcases CISL’s continued leadership in the high-performance computing realm. In FY23, CISL also made key strides in computational and data science, including preparing for exascale and cloud computing, and reopened the visitor center at the NSF NCAR-Wyoming Supercomputing Center (NWSC) to the public with a new look.
CISL ended FY23 with Thomas Hauser being named as CISL Director following an international search. Previously acting as interim director, Hauser will navigate the Lab through the next generation of computing opportunities and challenges as CISL continues to pursue its objectives. CISL works to provide and develop innovative Earth System Science-focused computing and data services to accelerate scientific breakthroughs and expand access; to lead applied computational and data science research for the Earth System Science community; to educate, train, and mentor the diverse and interdisciplinary Earth System Science workforce of tomorrow; and to cultivate a thriving, diverse, and fully empowered CISL staff.
Report contents
CISL By the Numbers FY23
Delivering Diverse and Cutting-Edge HPC Environments
Preparing for Exascale Computing
Data Repositories and Services
Advancing Data Science
Outreach, Diversity and Education
CISL By the Numbers FY23
|
|
|
|
|
|
|
|
|
|
|
|
Delivering Diverse and Cutting-Edge HPC Environments
In fiscal year 2023, CISL furthered its mission to support Earth systems science advancement for the communities it serves, successfully introducing and launching Derecho, NCAR’s powerful new supercomputer. A “Derecho roadshow”—a half-day introduction to NCAR’s new HPE Cray EX cluster—toured through NCAR’s labs, offering successful in-person and hybrid events. In Q2, Wyoming staff received and configured all Derecho hardware. In Q3, Derecho opened to Accelerated Scientific Discovery (ASD) users in Q3. Finally, in Q4, the Derecho system opened to all users.
Throughout the year, CISL continued to operate the NCAR–Wyoming Supercomputing Center (NWSC) where our HPC resources are housed, maintaining and supporting HPCD storage, file systems, and integrated tape archive solutions with proficiency and ingenuity. In Q2, the team completed a Cheyenne outage to switch to a new coolant needed to keep the system running—and finished the work a day and a half ahead of schedule.
Furthermore, in FY23 CISL was recognized for its thought leadership from the numerous user groups, talks, and workshops it conducted. For example, in early 2023, offered a workshop on GPU acceleration on Derecho that was so well-received, users were requesting more workshops for the future, leading to seven additional how-to workshops on various computing topics. CISL leadership plans to strategically shift several topics to the NCAR HPC User Group (NHUG) forum, transforming NHUG into a “watering hole” for CISL services and community of practice discussions.
Throughout the year, CISL provided exceptional service in helping users transition to Derecho from previous systems, as well as in supporting them in continuing on these systems when needed.
- Cheyenne, Laramie, and Casper remained operational through FY23 thanks to HPCD staff support. Cheyenne and its Laramie test system are set to be decommissioned in early 2024.
- Staff thoroughly developed, reviewed, and updated user documentation and best practices for the Help Desk, Advanced Research Computing (ARC), and Systems Accounting Manager (SAM) prior to Derecho’s launch.
- Early in the fiscal year, the High-Performance Computing Division (HPCD) extended user access to Gust, Derecho’s test system. Gust remained available for Accelerated Scientific Discovery (ASD) projects until June. As numerous users ported their applications to Gust and ran jobs, HPCD staff provided critical system and user support.
Furthermore, to prepare for cloud computing, staff conducted market research and collected user requirements and use cases from the Technology Development Division (TDD).
CISL enjoyed a strong presence at the Supercomputing 2022 conference in Dallas, hosting a booth and holding workshops and events. In other appearances, staff members presented at the Software Engineering Assembly/Improving Scientific Software (SEA/ISS) conference and at the Cray Users Group.
Finally, NCAR supercomputers supported a massive volume of advanced research and investigation into the Earth systems sciences.
- In FY 2023, the combined Cheyenne and Derecho supercomputers received 80 large-scale requests from universities and other institutions for approximately 866 million core-hours and 175,000 GPU-hours.
- Ultimately, Cheyenne and Derecho together supported 69 projects, 652 million core-hours, 54,000 GPU-hours, and 3.1 PB of Campaign Storage.
- CISL’s annual survey of university users to collect new publications based on the work done on NCAR HPC systems received a total of 425 responses. Users reported 425 new peer-reviewed publications in FY23 along with 144 other publications and 94 dissertations. The survey responses showed an overall satisfaction with the services that CISL provides of 4.7 (on a scale of 1 to 5).
Preparing for Exascale Computing
In FY23, CISL took major steps toward its goal of preparing for the unprecedented power and promise of exascale computing.
The “Preparing for Exascale (PfE): Partnerships” team, which researches and establishes partnerships with key agencies to help move NCAR toward exascale computing, established notable connections in FY23:
- CISL leadership conducted two meetings with the Deutsche Klimarechenzentrum (DKRZ) to discuss formal partnerships and areas of collaboration.
- CISL met with the United States Geological Survey (USGS) to explore a computational and cyberinfrastructure partnership leveraging the NWSC facility.
- CISL met with the United States Department of Energy (DOE) to define areas of collaboration in exascale, potential collaborations on input/output (I/O), and other areas.
The team charged with preparing system/user support and software engineering for exascale platforms achieved some noteworthy milestones as well. The team ensured two ASD projects were GPU-ready for their respective advanced discovery projects:
- The Cloud Model 1 (CM1) is an idealized advanced numerical model designed for both microscale and mesoscale phenomena.
- The [fn]MURaM1 magnetohydrodynamics model, the first unified CPU/GPU version of the MURaM code.
Both the CM1 and MURaM projects were able to conduct their efforts on the GPU portion of Derecho when it launched. The MURaM ASD project was very successful and was able to accomplish all science goals. The CM1 project was also successful and exceeded the scientists’ expectations.
NCAR’s Software Engineering Assembly pivoted to become a community of practice for all software engineers at NCAR, UCAR, and UCAR Community Programs (UCP). Staff members have formed a new committee and have organized their first-panel discussion—the topic will be continuous integration and continuous delivery/deployment (CI/CD).
The staff made significant advancements in porting physics modules to GPUs and evaluating a major code for GPU readiness. As part of the NSF-funded EarthWorks project, CISL completed the ports and integrations of the GPU versions of PUMAS/MG32
CISL also made impressive progress in evolving NCAR machine learning models toward exascale computing. For example, the Machine Integration and Learning for Earth Systems (MILES) team:
- Developed and analyzed Convergent Risk Intelligence for Severe Impacts to Society (CRISIS) machine learning tornado risk model that incorporates both meteorological and societal data uncertainties to estimate impacts.
- Evaluated a machine-learning warm rain microphysics emulator for full incorporation into the CAM source code. The team reduced the emulator from seven neural networks down to one and identified issues in the microphysics scheme being emulated that resulted in improvements to the overall scheme. The emulator has been retrained and is undergoing further testing in CAM.
- Successfully evaluated the scalability of the Earth Computing Hyperparameter Optimization (ECHO) package on Derecho, and added more guidance and tutorial documentation for Derecho users.
CISL also participated in NCAR-wide planning efforts to develop a Community Software Facility to accelerate the move to heterogeneous computing platforms, and an implementation plan is in progress. Helping define the need for the Community Software Facility, the Exascale Tiger Team, a cross-institution committee completed its work by delivering a series of recommendations that outline both the science opportunities and challenges faced by all of NCAR as a whole as well as each laboratory. Presentations were given to both the UCAR executive committee, as well as each laboratory.
Data Repositories and Services
Through multiple forward-looking initiatives, CISL-managed Data Services embraced their mission to support open data access and next-generation workflows and analyses in FY23. Throughout the year, Data Services ensured successful publication and archiving for all datasets generated by either NCAR Principal Investigators (PIs) or by external sources. Two notable datasets added to NCAR collections included: a full record of model-level output from ERA-5;5
The Research Data Archive (RDA) received recertification from CoreTrustSeal as a Trustworthy Data Repository Application. The former Digital Asset Services Hub (DASH) Repository is now the expanded, rebranded Geoscience Data Exchange (GDEX).
Data Services also rolled out a modernized website that is aligned with broader NCAR web modernization efforts. Throughout the year, Data Services has also implemented innovative software enhancements:
-
For any services requiring user authentication, GDEX and RDA now support federated login through the Open Research and Contributor ID (ORCID) ecosystem. This has eliminated the need for users to have RDA or NCAR accounts to download data.
-
The GDEX Application Programming Interface (API) is now ready to support dataset uploads in production from NSF Community Instruments and Facilities (CIFs).
-
To continue building DASH Search as the comprehensive one-stop shop for NCAR resources, CISL developers have included filtering by time range, publisher, and data format. Also currently in testing: improved dataset file format coverage for International Standardization Organization (ISO) records.
Through FY23, CISL maintained the following computational resources for scientific data collection: Stratus for Object Storage, POSIX for Campaign Storage, tape archival services and disaster recovery, infrastructure servers, and the Bifrost Ethernet network. An effort to select new open source Object Storage infrastructure is on hold as NCAR determines whether Object Storage is the best choice for its storage needs.
Advancing Data Science
In FY23, CISL teams achieved many milestones, from delivering a product to NASA, to approaching cloud computing success, to publicly releasing a new version of a well-liked geoscience software program.
The Data Assimilation Research Section (DAReS) completed an updated Data Assimilation Research Testbed (DART) interface for the TIEGCM7
CISL continued its support of the cross-center Earth System Data Science (ESDS) initiative. A primary aim of this grass roots effort is to build a community around data science that brings together domain scientists and CISL technical staff. Some of the notable accomplishments of ESDS this year include the establishment of formal governance with representatives from each of the participating LCPOs, the expansion of ESDS to include participation from Unidata, and EOL, and organizing two hackathon events: one around unstructured grids used in geoscience numerical modeling, and the other focusing on key packages in the Scientific Python Ecosystem. ESDS also run bi-weekly forums and staffed weekly office hours.
A cross-lab team from CISL worked to establish a cloud pilot environment, with plans to unroll it soon: the team deployed a prototype application cluster to support a hybrid cloud Pangeo/JupyterHub environment with guidance from the ESDS community. The prototype cloud environment features data analysis capabilities and containerization microservices. Provisioning of CISL’s on-premise prototype cloud is predicted to be complete in Q1 2024, and will be opened up for friendly users in Q2.
With support from an NSF EarthCube award, CISL’s Geoscience Community Analysis Toolkit (GeoCAT) team continued to lead the development of the Python package UXarray, a class extension to Xarray that supports fundamental analysis operators for unstructured grids. In addition to a new internal design, the team developed and publicly released several visualization and analysis functions that operate directly on unstructured grids used by next generation climate and global weather models. New versions of UXarray are released monthly. The initial list of user operators will be completed in the second quarter of 2024, with the team adding additional, community-driven capabilities thereafter. CISL’s ViSR team also added a Python API to VAPOR (Visualization and Analysis Platform for Ocean, Atmosphere, and Solar Researchers), allowing interactive 3D visualization within a Jupyter Notebook. The team also expanded on the VAPOR Python Documentation and tutorials leveraging Project Pythia’s infrastructure.
Outreach, Diversity, and Education
In FY23, CISL strove on multiple fronts to substantiate its commitment to a rich, diverse community of Earth systems scientists—a venture that included expanding its resources’ accessibility to a broader cross-section of communities and individuals. CISL also invested in its own workforce, providing ongoing learning, development, and outreach opportunities for staff, students, and associates. Finally, CISL effectively pursued its ongoing mission to educate the public in the scientific research and insight it pursues with such passion.
CISL also implemented and promoted revised allocations policies and user profiles to make Derecho and other resources more relevant and accessible to a broader community. These changes introduced a new Data Analysis project type. Subsequently, the team saw a strong increase in small-scale allocations for FY23, with more than 331 requests received—nearly 20% more than the previous high in FY2017. Nearly a quarter of the requests were for classroom and data analysis projects.
The Visualization Services and Research Group (ViSR) successfully completed four visualization projects in collaboration with RAL, CGD, and CISL, including the development of an interactive dashboard. Their visualizations on megafires, created using VAPOR, received recognition, winning the Best Visualization award at PEARC23 and the 2023 HPCWire Readers' Choice award for the best visualization in the data analysis and AI category. Additionally, the ViSR team provided VAPOR walkthroughs and visualization support to a team of journalists from the Straits Times (Singapore), with the visualizations ultimately being featured in an October article about the 2019 Haze event in Singapore.
CISL’s flagship internship program, Summer Internships in Parallel Computational Science (SIParCS), experienced notable growth:
- CISL staff proposed 21 projects, of which CISL was able to support 16 projects.
- For 16 technical intern positions, SIParCS received 84 undergraduate and 63 graduate student applications.
- The non-technical CODE internship position received ten graduate student applications.
- Six interns were also funded under programs other than SIParCS, an increase over previous years.
SIParCS collaborated with NCAR Education, Engagement & Early-Career Development (EdEc) for mentor training, intern orientation, organizing for the Professional Development Workshop Series, Poster Symposium, and social activities.
CISL completed and published its pioneering Educational Modules, developed through teacher collaboration and feedback via an online platform. CISL will continue to maintain the Pi-WRF11
In May, the NWSC visitor center reopened after a complete redesign, with a series of new videos and interactive displays installed to highlight the role of supercomputing in scientific discovery. The new exhibits are designed to allow them to be updated and modernized without the need for a full replacement. In fiscal year 2023, 518 total guests visited overall, with visitors steadily increasing each quarter, due (at least in part) to the visitor center reopening to the public.
Footnotes
1 Max Planck University of Chicago Radiative MHD. MHD = magnetohydrodynamics.
2 Parameterization of Unified Microphysics Across Scales, Morrison–Gettelman version.
3 RRTM for General Circulation Model Applications—Parallel. RRTM = Rapid Radiative Transfer Model.
4 Cloud-Layers Unified by Binormals.
5 European Centre for Medium-Range Weather Forecasts Reanalysis Data, 5th-generation.
6 A high-resolution dataset for the contiguous United States (CONUS) covering more than 40 years (1980–2021) at
four-kilometer grid spacing.
7 Thermosphere Ionosphere Electrodynamics General Circulation.
8 NASA’s Community Coordinated Modeling Center.
9 Ensemble Kalman filter–Quantile Conserving Ensemble Filter.
10 Radiative Transfer for TOVS; TOVS = TIROS Operational Vertical Sounder; TIROS = Television Infrared Observation Satellite.
11 Weather Research and Forecasting Model (WRF) on a Raspberry Pi.