CIROH: Data Science Team
The CIROH Data Science Team targets NOAA research priorities, seeks new data sources and products to conduct state-of-the-science, develops novel models to address research and operational gaps, and creates innovative research products to communicate and share CIROH hydrological advancements.
Summary: CIROH data science objectives require the technical skill sets to solve complex hydrological problems and the curiosity to identify what problems need to be solved. Leveraging a new era of big data, data science tasks connect mathematics, computer science, and statistical analyses with hydrological phenomena directed by the research and operational priorities from NOAA, and address the respective gaps by developing scientifically reproducible approaches. The Data Science Team leverages an array of methods, data, and tools to accomplish their mission, including but not limited to, tools and libraries within Python for data analysis and modeling purposes, interactive communication platforms such as Tethys to share information and results, API’s connecting data to processing, stakeholder engagement to identify key gaps and areas for improvement, and peer-reviewed publications and conference presentations to communicate advances in hydrological data science. The objectives are Next Generation (NextGen) Water modeling centric, seeking to produce innovative methods for connecting and leveraging new data sources (e.g., data preprocessing, data assimilation) into the National Water Model (NWM), novel model formulations for meteorology, land surface processes, and streamflow generation over the hydrofabric of the NextGen (e.g., snow processes, conversion to streamflow), and using NWM outputs to increase stakeholder engagement (e.g., supply outlooks, flood forecasting awareness). Fitting within the research themes, the Data Science Team builds upon existing data streams, develops pre- and post-processing tools, integrates machine learning, enhances visualization, and implements open-source tools to advance research and data products into information that benefits hydrological decision-making. A majority of the workflows are cross-cutting to support broader CIROH applications, and individualized to address specific stakeholder needs.
Research and Operational Gaps: The CIROH Data Science Team develops and maintains a contemporary knowledge of the state of the science concerning hydrological modeling advancements from both research and operational perspectives. Maintaining the necessary operational knowledge includes, but is not limited to, an understanding of the NWM with respect to the internal modeling processes connecting precipitation to streamflow and the motivations and structure for transitioning to the Basic Model Interface (BMI) and hydrofabric structure of NextGen. The conceptual workflow includes data assimilation, model connectivity and pathways of relevant sub-modules, model outputs and respective products, general comprehension of data management and computing, and data access protocols to connect extension products to the NWM. Complementing the operations knowledge, the Data Science Team has a comprehensive understanding of the general state-of-the-science hydrological modeling inputs, processes, methods, outputs, and sharing platforms. Within the particular hydrological modeling topics, each data scientist is an expert in a particular subject (e.g., machine learning, spatial modeling, snow modeling, climate downscaling, hydrological processes, etc.). The CIROH Data Science Team develops innovative methods to support NOAA and USGS research priorities.
Data Mining and Exploration: Data is a key driver of modeling advancements but is rarely available in the correct form or location to support hydrological research or operations. After selecting a research priority, the CIROH data science tasks include identifying applicable data sources to address the overarching research question(s) with respect to guiding technical readiness levels (TRL) and/or the ability to demonstrate a clear pathway into or support operations. Example sources may include but are not limited to, actively available and under-utilized remote sensing products, citizen science/crowd-sourcing, externally operated in situ sensors, and data-fusion techniques. The Data Science Team identify potential sources and brainstorm their utility for research or operations. The tasks use open-source tools (e.g., Python) to support data acquisition (e.g., APIs, cloud storage), processing and cleaning, and evaluation. Data mining and exploration supports the development of tools supporting data access, data assimilation, data processing, and evaluation to share with all CIROH members and the greater hydrological community.
Predictive Modeling: The CIROH Data Science Team connects observations to predictions to support NOAA’s goal of creating a Water Ready Nation. Modeling is a fundamental need for understanding and predicting hydrological phenomena, whether it be developing data-driven approaches leveraging advanced machine learning algorithms, conceptual models replicating the behavior at a system scale, or physically-based models explicitly characterizing the hydrological processes governing streamflow generation. Data Mining and Exploration research activities support Predictive Modeling advancements, where new data sources provide a platform to calibrate physically-based models and form new features for data-driven workflows. Predictive modeling advancements take advantage of feature engineering, exploring different model formulations, and evaluating model performance based on specific use cases and testbed environments. When applicable, the CIROH Data Science Team makes all research products publicly available to support other CIROH researchers as well as support education activities such as curriculum development (e.g., data-driven modeling tutorials).
Research Products: Supporting the transition of hydrological research into operations (R2O) is a fundamental motivation of the CIROH Data Science Team. The R2O process emphasizes the communication and peer review of hydrological modeling advancements to support idea and workflow validation. Research products do not need to be in a final form to contribute to CIROH or the overall hydrological community, as sharing of in-progress work, knowledge, or challenges provides a pathway to community development and collaborative solutions. Initial research products share in-progress work and/or challenges surrounding the nation’s water. A key focus of research products is to be cross-cutting and support broader CIROH applications, and individualized to address specific stakeholder needs. Preliminary research products need to express key water resources challenges and opportunities, clearly expressing the research priority and need for innovative hydrological modeling advancement to support emergency management, resource management, or general public awareness. As CIROH matures and projects enter a refined form, research products transition to tools targeting decision-making, conference presentations, and peer-reviewed manuscript submission that provide a key research or operational advancement to NOAA and the greater hydrological science community. The development and finalization of research products highlight new research and operational gaps, hence, setting the stage for future work.
Water Resources Research Vol. 55, Iss. 12
Francesco Avanzi, Ryan Curtis Johnson, Carlos A Oroza, Hiroyuki Hirashima, Tessa Maurer, Satoru Yamaguchi
Ryan Curtis Johnson, Steven J Burian, Carlos A Oroza, James Halgren, Trevor Irons, Daniyal Hassan, Jiada Li, Tracie Kirkham, Jesse Stewart, Laura Briefer, Danyal Aziz, Emily Baur