SIParCS 2022 - Teagan Johnson
Developing a Scientific Data Search Engine: Architecture & Technologies
NCAR’s diverse scientific data holdings have historically been difficult for external scientists and users to search across and find the data, publications, and software they need to do their science. While there is a current search tool that aggregates these scientific holdings, the idea of this project is to experiment with searching using a different technical approach.
This project is focused on creating a minimal and robust search application that may eventually be deployed. Going into the summer, our goal was to enhance the technical design and improve user-facing features of the application. These changes included adding validation to the many-thousands of metadata files in the search engine, significantly improving the efficiency of deleting metadata when they are removed from each of NCAR’s labs, providing a way for Google to efficiently crawl and index the scientific metadata, adding the ability to search with facets, and more.
In order to accomplish these goals, we utilized web development technologies (Spring Boot, Solr, GitHub, etc.), used Agile scrum methodologies, and applied best practices for web development. Our work demonstrates the value that can be provided to scientists that are solving some of the biggest challenges related to atmospheric and earth systems. The search engine's robust and minimal design allows for easy and accurate access to NCAR’s rich archive of scientific metadata.
Mentors: Nathan Hook, Saquib Aziz Khan, Eric Nienhouse
Slides and poster