Skip to content Where Legends Are Made
Cooperative Institute for Research to Operations in Hydrology

CUAHSI HydroShare Modernization

Principal Investigator: Jordan Read
Research Team: Anthony Castronova, Martin Seul
Insitution: Consortium of Universities for the Advancement of Hydrologic Sciences, Inc., CUAHSI
Start Date: August 1, 2022 | End Date: July 31, 2024
Research Theme: Hydroinformatics

The HydroShare scientific data repository (www.hydroshare.org) is widely used to support data access and modeling in the water community. The initial HydroShare implementation has been widely adopted by the water science community but has several limitations for working with large scientific datasets (> 100GiB). The core concepts, specifically the resource data model, can be extended to provide these capabilities to a wide range of data types, scales, and use cases. This project has two interrelated goals to ensure that HydroShare supports the large-scale modeling that is envisioned for CIROH research and improves the workflow documentation needed to advance community modeling: (1) Enhance the capabilities and concepts established by the HydroShare project to support large and distributed data; and (2) Redesign and reimplement core HydroShare functionality for increased performance in a cloud deployment environment, leveraged by container encapsulated cloud applications and infrastructure-as-code deployment.

We approached the project by first creating a design for the modernized system with an eye on reusable and independent software components that would meet project requirements, e.g. access control, workflow engine, metadata schema. In year one, efforts focused on the development of these fundamental building blocks. Major accomplishments include implementing a centralized access control mechanism, introducing JSON-based metadata to improve resource management and discovery, establishing a cloud-based workflow service to support diverse data processing needs, creating workflow templates for modeling related tasks such as subsetting the NextGen hydrofabric and meteorological data collection, and developing an initial cloud-based HydroShare prototype. During project year two, we will continue to advance these capabilities while integrating them into CUAHSI’s operational HydroShare repository to make project outcomes accessible and usable to the broader scientific community.

This project has several operational benefits and broader impacts. The modifications to HydroShare support research in operations by providing a mechanism for CIROH scientists to build modeling studies (data collection, pre-processing, post-processing) in the cloud and subsequently publish large-scale (100+ GiB) findings within the CIROH community, which enables equitable access to the data necessary for building and executing model simulations. Project outcomes will lower the barrier of entry for using the NextGen hydrofabric in modeling applications, collecting and preparing meteorological forcing data, archiving large simulation outputs, and leveraging cloud-native data formats. Another broader impact of this work is that it establishes open-source software solutions that can be deployed by individual research groups in neighboring disciplines for similar large-scale modeling efforts. Our method for encapsulating scientific metadata for data stored in cloud buckets can be leveraged by community repositories to support distributed hosted data as well as compute-aligned data.