Skip to content Where Legends Are Made
Cooperative Institute for Research to Operations in Hydrology

Machine Learning in Hydrology

The ML track will provide hands-on workshop sessions exemplifying ML methods using current CIROH modeling projects seeking to advance the application of ML in operational hydrology. CIROH will provide the CIROH Cloud workspace to ensure environmental stability; workshop leads will provide the material on GitHub and tentatively cover LSTM, XGBoost, and MLP modeling algorithms. Track attendees can expect to leave with greater knowledge of data processing, ML models and their respective applications, training and evaluation procedures, result visualization, and a stronger foundation to apply the workflows to their unique hydrological modeling objectives.

Lead: Ryan C. Johnson, University of Utah

Workshop Listings

In this workshop we will demonstrate the capabilities of and various use cases for post-processing National Water Model forecast outputs with the Light Gradient Boosting Machine (LightGBM) implementation of gradient-boosted decision trees. Within the Lake Champlain basin, we plan to demonstrate two different use cases for post-processed National Water Model forecasts. The first will be using field observations to improve operational flow forecasts; the second will be taking those improved flow forecasts and transforming them to forecasts of concentration for various water quality constituents (total phosphorus, nitrate, chloride).

The main goal of this workshop is to introduce a post-processing machine learning framework using the Long Short-Term Memory (LSTM) algorithm for hydrological model bias correction in operational settings, focusing on low flows and accounting for water operation impact. The session begins with an overview of machine learning concepts and the LSTM algorithm, followed by hands-on activities focusing on the ML model development pipeline, covering data preprocessing, feature selection, hyperparameter tuning, model training, and evaluation. By the end of the workshop, attendees will gain a solid understanding of ML techniques for hydrological applications, with practical experience in applying LSTM to improve the accuracy of hydrological models. This workshop is designed for hydrology professionals, researchers, and students, regardless of prior ML experience, and builds upon last year’s session with updated content and new insights.

SHAP (SHapley Additive exPlanations) values for Interpretable Machine Learning

Day 2 Session 2

Ali Dadkhah
Harrison Myers
Shaurya Swami
Ryan van der Heijden
Kristen Underwood

This workshop introduces participants to SHAP (SHapley Additive exPlanations) values (Lundberg & Lee, 2017), a useful tool for interpreting machine learning models by analyzing feature importance. Participants will learn the fundamentals of SHAP values and their practical applications to understand how individual features influence predictions in regression and classification models. Using Python libraries, we will walk through hands-on examples to explore how SHAP values can enhance model transparency, provide actionable insights, and build trust in machine learning predictions, with a focus on real-world applications in water resources.

Machine learning/Deep learning methods for geospatial modeling in hydrologic sciences. Students will utilize convolution networks for predictions of hydrological conditions from images. We will utilize synthetic datasets to learn methodology. We will utilize satellite images to apply methods to real-world scenarios.

A Tandem EVolutionary Algorithm for Feature Selection (TEVA)

Day 3 Session 2

Kristen Underwood
Donna Rizzo
John Hanley
Ali Dadkhah
Ryan van der Heijden
Cailin Gramling
Shaurya Swami

Underwood et al. (2023) have recently introduced the tandem evolutionary algorithm (TEVA) of Hanley et al. (2020) to the water resources and ecology domains, and applied it to identify features (catchment-scale attributes) and feature interactions important in determining patterns in Dissolved Organic Carbon across the continental US. TEVA has particular advantages for feature selection in large, multivariate observational data sets of complex systems like riverscapes or ecosystems, and has been shown to outperform logistic regression or Random Forest for identifying feature interactions and equifinality. TEVA finds interactions between multiple variables that may result from either additive processes or feature interactions, and not only extracts features significantly associated with a given outcome class(es), but also identifies the specific value ranges associated with those features. This algorithm is also robust to issues of mixed data types (continuous, categorical), missing data, censored data, skewed distributions, and unbalanced target classes or clusters.