
A novel framework is said to offer accurate water flow predictions and simulations with unprecedented efficiency
Floods are some of the most devastating natural disasters communities face. A group at the Pennsylvania State University has recently developed a computational model to streamline flood prediction in the continental US. The researchers said their model offers predictions at unprecedented levels of efficiency and accuracy compared to traditional models, creating simulations with a highly optimized system of processing and simulating data.
Their model, described as a high-resolution differentiable hydrologic and routing model, incorporates big data and physical readings — such as data taken from river networks and river flow generation theories — into a system that uses AI techniques to simulate and predict water movement. Details of the approach are published in Water Resources Research.
A common water model used by hydrologists in the US is the National Oceanic and Atmospheric Administration (NOAA)’s National Water Model (NWM), according to Chaopeng Shen, professor of civil and environmental engineering at the Pennsylvania State University and co-corresponding author of the paper. The model uses weather data to simulate streamflow — the rate at which water flows in a river — across the continental US.
Traditional models like the NWM must undergo parameter calibration, where large datasets consisting of decades of historical streamflow data from around the United States are processed to set parameters and produce useful simulations. Although this model is widely used by organizations like the National Weather Service to inform flood forecasting, according to Shen, the parameter calibration makes the process very inefficient.
“To be accurate with this model, traditionally your data needs to be individually calibrated on a site-by-site basis,” Shen said. “This process is time consuming, expensive and tedious. Our team determined that incorporating machine learning into the calibration process across all the sites could massively improve efficiency and cost effectiveness.”
The team’s model implements a subset of AI techniques known as neural networks that efficiently recognizes complex patterns across large, dynamic datasets. Neural networks work like a human brain, creating logical connections between their units, and can effectively operate autonomously and improve over time as they analyze more data.
According to Yalan Song, assistant research professor of civil and environmental engineering and a co-corresponding author on the paper, the team’s model implements several types of neural networks to recognize the patterns of key parameters and learn how they change in time and space.
“By incorporating neural networking, we avoid the site-specific calibration issue and improve the model’s efficiency substantially,” Song said. “Rather than approaching each site individually, the neural network applies general principles it interprets from past data to make predictions. This greatly increases efficiency, while still accurately predicting streamflow in areas of the country it may be unfamiliar with.”
According to Shen, water models exist that make predictions entirely via machine learning trained on observational data about how water should behave within the range of training data, but the lack of broad physical knowledge supporting these predictions can cause these models to downplay the intensity of previously unseen outliers in simulations. The model may use existing data to infer how a certain amount of rainfall over a set time will raise a particular river, but it would not know how to make a correct prediction when it encounters extreme rainfall events that haven’t been recorded in the region in the past. Shen said this can be dangerous in the context of flood prediction and increasing weather extremes, since it would downplay the actual risk. According to Song, the design of their model simultaneously offers the benefits of physics-based models and machine learning models, while improving the accuracy of extreme event predictions.
The team trained their new model with a large dataset of streamflow information recorded from a total of 2,800 gauge stations — sites that measure streamflow in rivers— provided by the United States Geological Survey, along with weather data and detailed basin information. Using 15 years’ worth of streamflow data, they tasked their model with predicting and creating a 40-year high-resolution streamflow simulation for river systems across the continental United States. They compared the simulation to the observed data, measuring the variance between the observations and the simulations. The researchers observed substantial improvements — overall by 30% — in streamflow prediction accuracy in approximately 4,000 gauge stations, which included the original 2,800 and additional gauge stations not included in the training data, compared to the current version of the NWM, especially in specific geological areas with unique structures.
“Our neural network approaches calibration by learning from the large datasets we have from past readings, while simultaneously considering the physics-based information from the NWM,” Song said. “This allows us to process large datasets very efficiently, without losing the level of detail a physics-based model provides, and at a higher level of consistency and reliability.”
Shen said this approach to calibration is not just efficient, but highly consistent, regardless of the region being simulated.
“The old approach is not only highly inefficient, but quite inconsistent,” Shen said. “With our new approach, we can create simulations using the same process, regardless of the region we are trying to simulate. As we process more data and create more predictions, our neural network will continue to improve. With a trained neural network, we can generate parameters for the entire U.S. within minutes.”
According to Shen, their model is a candidate for use in the next generation framework of NWM that NOAA is developing to improve the standards of flood forecasting around the country. While not yet selected, Shen said their model is “highly competitive” as it is already coupled to this operational framework. However, it may still take time for model users to get comfortable with the AI component of the model, according to Shen, who explained that careful independent evaluations are required to demonstrate the model accuracy can be trusted even in untrained scenarios. The team is working to close the final gap — improving the model’s prediction capability from daily to hourly — to make it more useful for operational applications, like hourly flood watches and warnings. Shen credited the research-to-operation work to civil engineering doctoral candidate Leo Lonzarich, noting that developing a framework other researchers can expand will be key to solving problems and evolving the model as a community.
“Once the model is trained, we can generate predictions at unprecedented speed,” Shen explained. “In the past, generating 40 years of high-resolution data through the NWM could take weeks, and required many different super computers working together. Now, we can do it on one system, within hours, so this research could develop extremely rapidly and massively save costs.”
Although these models are primarily used for flood prediction, simulations provide hydrologists with information that can be used to predict other major events, such as droughts. Such predictions could be used to inform water resource management, which Shen said could have implications for agriculture and sustainability research.
“Because our model is physically interpretable, it can describe river basin features like soil moisture, the baseflow rate of rivers, and groundwater recharge, which is very useful for agriculture and much harder for purely data-driven machine learning to produce,” Shen explained. “We can better understand natural systems that play critical roles in supporting ecosystems and the organisms within them all over the country.”