Surface-subsurface integrated water prediction system for long-term assessment with physics-informed differentiable modeling

Principle Investigator: Chaopeng Shen

Institution: Pennsylvania State University

Theme: Water Prediction - Hydrologic Modeling

Project Started: 2024

Research Team Members

Yalan Song - Pennsylvania State University

Kimberly Van Meter - Pennsylvania State University

Objective:

This project will build a differentiable integrated model to enable two-way learning of both surface and subsurface data. We hypothesize that such two-way learning can improve parameterizations and evolve process representations beyond what is possible with one data source alone.

Approach:

We will leverage published National Hydrologic Model (NHM) datasets and leverage differentiable parameter learning to implement and run a differentiable version of USGS's pywatershed+MODFLOW for the contiguous United States (CONUS). pywatershed simulates surface processes and groundwater recharge, while MODFLOW simulates gridded groundwater levels and provides storage, discharge, and baseflow feedback to pywatershed. We will aim to respect the original structural design, but some simplifications or improvements will be needed for parallel simulation of many basins (grouped into batches), because a major design objective is to enable the model to efficiently and effectively learn from data at large scales.

Impact:

Building on our other CIROH work, this project further builds the basis for physics-respecting water quality and terrestrial models that can learn from data. This work tackles several USGS priorities in integrated water prediction, water availability assessment, and terrestrial models (USGS-RT2-FA1, USGS-RT2-FA2 and USGS-RT2-FA4).

Abstract:

To assess water availability, defined as the intersection of suitable quantity and quality for desired uses, we need models that can accurately capture surface water and groundwater changes while providing interpretable narratives for the predictions. Process-based models (PBMs) provide such narratives but cannot effectively learn from data, while deep learning (DL) models are often accurate but difficult to interpret and cannot output intermediate physical variables, e.g., groundwater recharge. Recent progress in “differentiable modeling” enables training neural networks and PBMs together on big data, reaching DL-level performance while producing intermediate physical variables. We propose to work with USGS to build a prototype differentiable model of coupled groundwater and surface water (informing USGS's pywatershed+MODFLOW system being developed in the Enterprise Capacity project) that can learn from data to produce continental-scale optimal parameters and accurate predictions. Both the surface (pywatershed) and subsurface (MODFLOW) modules will be made differentiable to enable two-way learning from surface water (streamflow, surface soil moisture, flux towers) and subsurface (baseflow, groundwater levels, deeper soil moisture, terrestrial water storage) datasets. Furthermore, we will incorporate simple nutrient and salinity modules to support water quality assessments. We will test the hypothesis that learning from surface data will strongly improve continental-scale estimates for recharge as well as scale-dependent MODFLOW parameters. Conversely, differentiable MODFLOW simulations can inform baseflow and water storage in pywatershed. Our setup further builds the basis for learnable and numerically-sound water quality and terrestrial models.

Project Keywords

Differentiable Model, Parameter Learning, Parameterization, Deep Learning, Machine Learning, Physics-Informed Machine Learning, Groundwater, Streamflow, Storage, Water Quality