Amy M. Russell
Thomas M. Over
William H. Farmer
20200728
Cross-validation results for five statistical methods of daily streamflow estimation at 1,385 reference streamgages in the conterminous United States, Water Years 1981-2017
Tabular digital data
Urbana, IL
U.S. Geological Survey
https://doi.org/10.5066/P9XT4WSP
This data release contains daily time series estimates of natural streamflow for 1,385 streamgages in 19 study regions in the conterminous U.S. from October 1, 1980, through September 30, 2017. These estimates are provided for gages from mostly undisturbed watersheds as defined by Falcone (2011), using five statistical techniques: nearest-neighbor drainage area ratio (NNDAR), map-correlation drainage area ratio (MCDAR), nearest-neighbor nonlinear spatial interpolation using flow duration curves (NNQPPQ), map-correlation nonlinear spatial interpolation using flow duration curves (MCQPPQ), and ordinary kriging of the logarithms of discharge per unit area (OKDAR). Location information and basin characteristics for study gages were obtained from the "Reference" gages of the GAGES-II dataset (Falcone, 2011, https://water.usgs.gov/lookup/getspatial?gagesII_Sept2011). Observed daily streamflow data were retrieved from the National Water Information System (NWIS) on September 7, 2018. NNDAR, MCDAR, NNQPPQ, and MCQPPQ estimates were computed following methods described by Farmer and others (2014), with updates to the flow-duration curve modeling which is described by Over and others (2018). OKDAR estimates were computed using pooled variograms for each study region following methods described by Farmer (2016). Daily streamflow estimation was conducted in a leave-one-out-cross-validation approach where each streamgage was treated as if ungaged and all the remaining streamgages in a study region were used to calibrate each method and perform estimations at the "ungaged" site. The observed streamflow records were compared to the five simulated streamflow records to help assess performance of each method. These performance metrics are provided at each gage for all five statistical methods.
References cited:
Falcone, J.A., 2011, GAGES-II: Geospatial Attributes of Gages for Evaluating Streamflow [digital spatial dataset] : U.S. Geological Survey Water Resources NSDI Node web page, https://water.usgs.gov/lookup/getspatial?gagesII_Sept2011.
Farmer, W.H., Archfield, S.A., Over, T.M., Hay, L.E., LaFontaine, J.H., and Kiang, J.E., 2014, A comparison of methods to predict historical daily streamflow time series in the southeastern United States: U.S. Geological Survey Scientific Investigations Report 2014–5231, 34 p., http://dx.doi.org/10.3133/sir20145231.
Farmer, W. H., 2016, Ordinary kriging as a tool to estimate historical daily streamflow records, Hydrology and Earth System Sciences, 20, 2721-2735, https://doi.org/10.5194/hess-20-2721-2016.
Over, T.M., Farmer, W.H., Russell, A.M., 2018, Refinement of a regression-based method for prediction of flow-duration curves of daily streamflow in the conterminous United States; U.S. Geological Survey Scientific Investigations Report 2018–5072, https://doi.org/10.3133/sir20185072.
The purpose of this data release is to inform hydrologic characterization at ungaged locations.
19801001
20170930
reference hydrology
Amy M Russell
CENTRAL MIDWEST WATER SCIENCE CENTER
Hydrologist
405 N. Goodwin Avenue
Urbana
IL
61801
United States
217-328-9773
arussell@usgs.gov
Location information and basin characteristics for study gages were obtained from the "Reference" gages of the GAGES-II dataset (Falcone, 2011, https://water.usgs.gov/lookup/getspatial?gagesII_Sept2011). Index gages were USGS streamgages within each GAGES-II study region that were identified as being of “reference” quality in the GAGES-II dataset with at least 10 complete water years (WYs) during the study period from WY1981 through WY2017. Observed daily streamflow data for 1,385 index gages were retrieved from the National Water Information System (NWIS).
20180907
NNDAR, MCDAR, NNQPPQ, MCQPPQ, and OKDAR estimates of natural streamflow were computed at 1,385 streamgages. NNDAR and MCDAR estimates were computed following methods described by Farmer and others (2014). NNQPPQ and MCQPPQ estimates were computed following methods described by Farmer and others (2014), with updates to the flow-duration curve modeling as described by Over and others (2018). OKDAR estimates were computed using pooled variograms for each study region following methods described by Farmer (2016). Daily streamflow estimation was conducted in a leave-one-out-cross-validation approach where each streamgage was treated as if ungaged and all the remaining streamgages in a study region were used to calibrate each method and perform estimations at the "ungaged" site.
201811
region##.zip, where ## represents the GAGES-II HUC02 study region - contains an individual tab-delimited text file for each reference gage.
Estimates of daily streamflow at reference gages. Each text file is named output_#### where the #'s represent the USGS Station ID number (8-15 digits).
date
Date of observed and estimated streamflow
1980-10-01
2017-09-30
date in YYYY-MM-DD format
obs
Computed daily mean streamflow reported from NWIS for given USGS station.
-2090
178000
cubic feet per second (cfs)
obsP
Exceedance probability of observed streamflow – the probability of the observed streamflow being equaled or exceeded on any given day. When the observed streamflow is zero, obsP is set to ‘NA’ in this study.
0
1
NNDAR
Estimate of daily streamflow using the Nearest-Neighbor Drainage Area Ratio (NNDAR) method. NNDAR estimates were computed following methods described by Farmer and others (2014).
-281.54
306485.8
cubic feet per second (cfs)
MCDAR
Estimate of daily streamflow using the Map-Correlation Drainage Area Ratio (MCDAR) method. MCDAR estimates were computed following methods described by Farmer and others (2014).
0
335188.9
cubic feet per second (cfs)
NNQPPQ
Estimate of daily streamflow using the nearest-neighbor nonlinear spatial interpolation using flow duration curves (NNQPPQ) method. NNQPPQ estimates were computed following methods described by Farmer and others (2014), with updates to the flow-duration curve modeling which is described by Over and others (2018).
0
15882161
cubic feet per second (cfs)
MCQPPQ
Estimate of daily streamflow using the map-correlation nonlinear spatial interpolation using flow duration curves (MCQPPQ) method. MCQPPQ estimates were computed following methods described by Farmer and others (2014), with updates to the flow-duration curve modeling which is described by Over and others (2018).
0
15882161
cubic feet per second (cfs)
OKDAR
Estimate of daily streamflow using the Ordinary Kriging of the logarithms of discharge per unit area (OKDAR) method. OKDAR estimates were computed using pooled variograms for each study region following methods described by Farmer (2016).
0
210531.9
cubic feet per second (cfs)
CONUS_PMs_byStation.csv
Computed Performance Metrics for each statistical method of streamflow estimation. Many of the performance metrics use a log-transformation of the daily streamflow; therefore, observed and simulated streamflows of zero were set to 0.001 cfs in computation of performance metrics.
Region
Study region for each streamgage
01
18
StationID
Unique USGS streamgage identification number
Method
Statistical method of daily time series estimation
NNDAR
Drainage area ratio method with nearest neighbor index selection
MCDAR
Drainage area ratio method with map correlation index selection
NNQPPQ
QPPQ simulation method with nearest neighbor index gage selection
MCQPPQ
QPPQ simulation method with map correlation index gage selection
OKDAR
Ordinary kriging of logarithms of discharge per unit area
PM
Performance metric used to evaluate streamflow estimates
nse
Nash-Sutcliffe efficiency of daily streamflows
Definition provided in SIR 2014-5231 (Farmer and others, 2014)
nsel
Nash-Sutcliffe efficiency of log-transformed daily streamflows
Definition provided in SIR 2014-5231 (Farmer and others, 2014)
rmse
Root mean square error of daily streamflows
Definition provided in SIR 2014-5231 (Farmer and others, 2014)
rmsne
Root mean square normalized error of daily streamflows
Definition provided in SIR 2014-5231 (Farmer and others, 2014)
nrmse
Normalized root mean square error of daily streamflows
Definition provided in SIR 2014-5231 (Farmer and others, 2014)
cvrmse
Coefficient of variation of the root mean square error of daily streamflows
Definition provided in SIR 2014-5231 (Farmer and others, 2014)
rmsel
Root mean square error of log-transformed daily streamflows
Definition provided in SIR 2014-5231 (Farmer and others, 2014)
rmsnel
Root mean square normalized error of log-transformed daily streamflows. This statistic is undefined when the divisor is zero. Any site with a single day of observed flow of 1 cfs will contain ‘Inf’ for this performance metric.
Definition provided in SIR 2014-5231 (Farmer and others, 2014)
nrmsel
Normalized root mean square error of log-transformed daily streamflows
Definition provided in SIR 2014-5231 (Farmer and others, 2014)
cvrmsel
Coefficient of variation of the root mean square error of log-transformed daily streamflows
Definition provided in SIR 2014-5231 (Farmer and others, 2014)
perr
Average percent errors of daily streamflows
Definition provided in SIR 2014-5231 (Farmer and others, 2014)
meandiffl
Mean of the differences of the log-transformed daily streamflows
Producer defined
cor.p
Pearson correlations of daily streamflows
Definition provided in SIR 2014-5231 (Farmer and others, 2014)
cor.s
Spearman correlations of daily streamflows
Definition provided in SIR 2014-5231 (Farmer and others, 2014)
MeanRat
Ratio of mean of daily streamflows
Definition provided in SIR 2018-5072 (Over and others, 2018)
VarRat
Ratio of coefficient of variation of daily streamflows
Definition provided in SIR 2018-5072 (Over and others, 2018)
logMeanRat
Ratio of mean of log-transformed daily streamflows
Modified from SIR 2018-5072 (Over and others, 2018)
logVarRat
Ratio of coefficient of variation of log-transformed daily streamflows
Modified from SIR 2018-5072 (Over and others, 2018)
q0.###, where ### is a flow quantile
Ratio of 0.### quantile of daily streamflows
Definition provided in SIR 2018-5072 (Over and others, 2018)
Value
Numeric value of performance metric
-3.634639e+04
9.437883e+12
CONUS_PM__summaries_byRegion.csv
Summaries of Performance Metrics by study region with infinite values removed.
Region
Study region
GAGES-II dataset (Falcone, 2011, https://water.usgs.gov/lookup/getspatial?gagesII_Sept2011)
01
18
Method
Statistical method of daily time series estimation
NNDAR
Drainage area ratio method with nearest neighbor index selection
MCDAR
Drainage area ratio method with map correlation index selection
NNQPPQ
QPPQ simulation method with nearest neighbor index gage selection
MCQPPQ
QPPQ simulation method with map correlation index gage selection
OKDAR
Ordinary kriging of logarithms of discharge per unit area
PM
Performance metric used to evaluate streamflow estimates
nse
Nash-Sutcliffe efficiency of daily streamflows
Definition provided in SIR 2014-5231 (Farmer and others, 2014)
nsel
Nash-Sutcliffe efficiency of log-transformed daily streamflows
Definition provided in SIR 2014-5231 (Farmer and others, 2014)
rmse
Root mean square error of daily streamflows
Definition provided in SIR 2014-5231 (Farmer and others, 2014)
rmsne
Root mean square normalized error of daily streamflows
Definition provided in SIR 2014-5231 (Farmer and others, 2014)
nrmse
Normalized root mean square error of daily streamflows
Definition provided in SIR 2014-5231 (Farmer and others, 2014)
cvrmse
Coefficient of variation of the root mean square error of daily streamflows
Definition provided in SIR 2014-5231 (Farmer and others, 2014)
rmsel
Root mean square error of log-transformed daily streamflows
Definition provided in SIR 2014-5231 (Farmer and others, 2014)
rmsnel
Root mean square normalized error of log-transformed daily streamflows. This statistic is undefined when the divisor is zero. Sites containing ‘Inf’ for this performance metric were not included in the summary statistic calculations.
Definition provided in SIR 2014-5231 (Farmer and others, 2014)
nrmsel
Normalized root mean square error of log-transformed daily streamflows
Definition provided in SIR 2014-5231 (Farmer and others, 2014)
cvrmsel
Coefficient of variation of the root mean square error of log-transformed daily streamflows
Definition provided in SIR 2014-5231 (Farmer and others, 2014)
perr
Average percent errors of daily streamflows
Definition provided in SIR 2014-5231 (Farmer and others, 2014)
meandiffl
Mean of the differences of the log-transformed daily streamflows
Producer defined
cor.p
Pearson correlations of daily streamflows
Definition provided in SIR 2014-5231 (Farmer and others, 2014)
cor.s
Spearman correlations of daily streamflows
Definition provided in SIR 2014-5231 (Farmer and others, 2014)
MeanRat
Ratio of mean of daily streamflows
Definition provided in SIR 2018-5072 (Over and others, 2018)
VarRat
Ratio of coefficient of variation of daily streamflows
Definition provided in SIR 2018-5072 (Over and others, 2018)
logMeanRat
Ratio of mean of log-transformed daily streamflows
Modified from SIR 2018-5072 (Over and others, 2018)
logVarRat
Ratio of coefficient of variation of log-transformed daily streamflows
Modified from SIR 2018-5072 (Over and others, 2018)
q0.###, where ### is a flow quantile
Ratio of 0.### quantile of daily streamflows
Definition provided in SIR 2018-5072 (Over and others, 2018)
mean
Average value of PM for all gages in each region by method
-5.378312e+02
1.970374e+11
median
Median value of PM for all gages in each region by method
-1.878265e+00
5.554143e+06
min
Minimum value of PM for all gages in each region by method
-36346.39
40634.88
max
Maximum value of PM for all gages in each region by method
1.451997e-01
9.437883e+12
n
Number of reference gages in each region
13
199
reference_gages_summary.csv
Summary of station information
REGION
Study region as assigned by GAGES-II HUC02 field
GAGES-II dataset (Falcone, 2011, https://water.usgs.gov/lookup/getspatial?gagesII_Sept2011)
1
18
STAID
Unique USGS streamgage identification number
U.S. Geological Survey - National Water Information System
U.S. Geological Survey Site Number
National Water Information System (NWIS) database
MEAN_DAILY_FLOW
average of daily flow data for each site
0
8492.1
cubic feet per second (cfs)
DRAIN_SQKM
Drainage area
GAGES-II dataset (Falcone, 2011, https://water.usgs.gov/lookup/getspatial?gagesII_Sept2011)
1.5
25791.0
Square kilometers
START_DATE
Beginning date of streamflow record used in study
1980-10-01
2007-10-01
Date in YYYY-MM-DD format
END_DATE
Ending date of streamflow record used in study
1990-09-30
2017-09-30
Date in YYYY-MM-DD format
COMP_WY_RANGE
List of complete water years included in study
List of water years with complete streamflow records; Discontinuous time periods separated by ";"
COMP_WY_COUNT
Number of complete water years included in study
10
37
years
20200813
