SIParCS 2021 - Sama Manalai
Harvester Automation for Metadata Search Web Application: Technologies
The Metadata Search Application is a multi-service search application developed to allow scientists to search across all digital scientific assets maintained by the National Center for Atmospheric Research (NCAR). To achieve this goal, a web-based search tool consisting of Harvester and Search services built on top of the Apache Solr search engine was developed. The Harvester service first reads metadata XML files produced by NCAR labs and sends that metadata to Solr. The Search service then queries Solr and displays search results to the end user. The goal of the work this summer was to improve the performance of the Harvester service and reduce manual work. Originally, for the Harvester service to read the metadata XML files, a developer manually retrieved XML files from metadata repositories on Github via a Command-Line-Interface. The Harvester service was unable to get updates from previously cloned repositories on startup, and it needed to re-read each file every time it ran. Over the summer, new capabilities were added to the Harvester service to harvest data more efficiently and without human intervention, thereby speeding up the process for updating metadata. This presentation discusses how the automation of the Harvester service was achieved through the utilization of technologies such as Spring Boot, JGit, Webhooks, IntelliJ, Ngrok, and Docker.
Mentors: Nathan Hook, Christy Grant, Saquib Kahn, & Eric Nienhouse
Slides and poster