
Citation
Jesus Gomez-Velez, Md Abu Bakar Siddik, Sean Turner, Ahad Tanim and Shih-Chieh Kao. 2025. Foundational Dataset for Developing Large-Sample Stream Temperature Models in the Conterminous United States. HydroSource. Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA.
Overview
This dataset provides inputs, evaluation results, and trained weights from a large-sample Long Short-Term Memory (LSTM) model designed to predict daily stream temperatures across unregulated river reaches in the conterminous United States (CONUS). It includes dynamic meteorological and hydrologic forcings, static physiographic attributes, and model outputs from cross-validation experiments spanning 300 basins. It supports reproducible modeling, direct application for new basins, and provides data suitable for integration with reservoir and river simulations under current and future climates. It contains two .zip files described below
· RQ-AI_runs.zip: Model outputs from 10-fold cross-validation experiments, including observed and predicted daily stream temperatures, along with test performance metrics for water years 2017–2019. Two versions are included:
1. Model trained and validated using subbasin-area weighted dynamic features.
2. Model trained and validated using whole-basin area weighted dynamic features.
· RQ-AI_inputs.zip: Collection of all formatted dynamic and static predictor datasets (meteorological, hydrologic, and physiographic features) used in model training and analysis. Detailed instructions and data structure is held at the following GitLab repository: https://code.ornl.gov/tempwise/training.
Methodology
Training sites with daily stream temperature records were selected based on record length and completeness. USGS gauges reporting daily average water temperature in °C (parameter ID 00010) were first identified. To develop predictor features, both dynamic and static datasets were incorporated. Dynamic variables were obtained from Daymet and Dayflow, which provide daily meteorological and streamflow information, respectively, for approximately 2.7 million stream reaches represented in NHDPlusV2 across the conterminous United States (CONUS). For each day in water years 2011–2019, meteorological inputs were derived from Daymet and spatially averaged across basin extents by overlaying Daymet grids with basin polygons and computing mean values. Historical streamflow reanalysis data for the selected sites were extracted from Dayflow. Static predictors were compiled from StreamCat, which provides more than 600 physiographic, land cover, and watershed metrics for local and upstream catchments within NHDPlusV2. Stream temperature modeling was carried out using a Long Short-Term Memory (LSTM) neural network, which integrated dynamic meteorological and hydrologic inputs with static catchment attributes. The output layer comprised a single neuron with linear activation, yielding the predicted daily stream temperature (°C).