Deep Learning Modeling and Data Assimilation Capabilities in the Nextgen Framework
Research Team Members
Objective:
Machine learning modeling capabilities for the Nextgen water modeling framework
Approach:
We've developed a basic model interface for a deep learning hydrological module. We have trained deep learning weights and successfully demonstrated simulations in the Nextgen water modeling framework. We have developed ensemble prediction capabilities, and we have demonstrated state-saving and model restart capabilities. We have developed a training scheme specifically designed for the Nextgen framework.
Impact:
By embedding data-driven models directly into the NextGen framework, we reduce the need for manual calibration, extend modeling capabilities to ungauged basins, and enhance the speed and adaptability of national-scale forecasts. This lays the foundation for smarter, more flexible operational water models capable of improving flood forecasting, water resource management, and climate adaptation.Abstract:
Introduction:
This project advances the use of deep learning to improve national-scale water prediction systems. Specifically, we are developing the long short-term memory (LSTM) module to be integrated directly into the NextGen hydrological modeling framework, which is intended to power future versions of the U.S. National Water Model.
Context:
Our challenge is to create a deep learning module that can learn directly from data and generalize across many catchments, including ungauged ones. We aim to address the core limitations in operational modeling: limited adaptability in the Nextgen framework, small catchment divides in the hydrofabric, and training LSTM weights for these constraints.
Impact:
The LSTM module may be included in the operational Nextgen NWM. Development from this project is entirely dedicated to this goal.
Concept and its significance:
In order to develop the operational module and trained LSTM weights we have been developing an easy-to-use and easy-to-udate training framework, spun off from the NeuralHydrology package. NOAA and NWM stakeholders will have the ability to validate module performance during runs of the Nextgen framework across CONUS.
Challenges:
Two main challenges have arised: 1) How do we train a deep learning model best for Nextgen framework, and 2) how to we integrate the LSTM into the NGIAB datastream. The hydrofabric of the NextGen water model has a discretized catchments much smaller than the typical streamgauge contributing area. This means when a deep learning model is trained on a wide-range of stream gauges, it rarely gets information that is directly related to the response of the contributing catchments. This problem persists in classical model calibration as well, and is being addressed by the ngen-cal product. In deep learning, the problem is that we require a gradient chain from the gauge location all the way through the modeling components to each hydrologic process, which is in-feasible with the Nextgen configuration.
What is the product:
We are delivering a working software module that integrates deep learning into NextGen, enabling:
1. Distributed prediction of streamflow from atmospheric input data and catchment attributes.
2. Model ensemble capability, where multiple predictions can be made and compared.
3. State-saving and model restart for long-term simulations and operational use.
4. Scoring-rule-based loss functions to enforce physically meaningful learning.
What’s new?
Although the LSTM has been widely studied in the hydrologic literature, we are ensuring that LSTM as a Nextgen module holds up to the standards set by academic studies. LSTM has been tested for selecting the best model for each watershed. We have done sensitivity analysis including up to 200 static catchment attributes for generalizability, and we’ve demonstrated that relatively few attributes are needed for best performance (about 10). We’ve shown that the LSTM Nextgen simulations match the flow duration curve of gauged runoff across large domains.
What is being done today?
1. We introduce physically informed loss functions that help the model learn meaningful hydrologic patterns (e.g., the rise and fall of a flood hydrograph).
2. We develop temporal and spatial batching techniques that allow scalable training across thousands of watersheds.
3. We are testing the LSTM memory state can be analyzed to detect unique or extreme hydrologic events.
4. REU team contributes by building a surrogate model of the existing NWM for faster simulations and multivariate prediction.