Brent T. Aulenbach
Joshua C. Henley
Kristina G. Hopkins
20230404
13: Models coefficients and statistics for regression models used to estimate streamwater loads for 12 water-quality constituents in 13 watersheds in Gwinnett County, Georgia for water years 2003-2020
tabular digital data
Reston, VA
U.S. Geological Survey
https://doi.org/10.5066/P9G8HZTY
Brent T. Aulenbach
Joshua C. Henley
Kristina G. Hopkins
2023
Hydrology, water-quality, and watershed characteristics in 15 watersheds in Gwinnett County, Georgia, water years 2002-20
publication
USGS Scientific Investigations Report
2023-5035
Reston, Virginia
U.S. Geological Survey
The dataset contains model coefficients and statistics for the 488 regression models used to estimate streamwater constituent loads for 13 watersheds in Gwinnett County, Georgia for two calibration periods, water years 2003-2010 and 2010-2020. Model terms were selected from an 11-parameter equation, which was a function of discharge, base flow, season, turbidity, and time (trend), using a forward stepwise ordinary least squares regression approach. Model coefficients were fit using U.S. Geological Survey (USGS) LOADEST load estimation software. Models were fit both with and without turbidity explanatory variables for 12 water-quality constituents: total suspended solids, suspended sediment concentration, total nitrogen, total nitrate plus nitrite, total phosphorus, dissolved phosphorus, total organic carbon, total calcium, total magnesium, total lead, total zinc, and total dissolved solids. The dataset includes a summary of sample concentrations used to calibration the models (period of samples collected, number of concentrations, number of censored concentrations, and number of outliers removed), model coefficients, and selected model statistics (concentration and load model R-squares, estimated residual variance, serial correlation in the model residuals, and Turnbull-Weiss normality test statistic of residuals). Portable document format files of LOADEST output are provided for each model in a “zip” file that contain model diagnostic statistics and plots for evaluating model fits.
The purpose of this dataset is to make assessable load estimation model coefficients, summary statistics, and USGS LOADEST load estimation software output containing model diagnostic statistics and plots for evaluating model fits. These models were used in estimating streamwater constituent loads in support of the USGS Scientific Investigations Report "Hydrology, water-quality, and watershed characteristics in 15 watersheds in Gwinnett County, Georgia, 2002-20". This study is part of a long-term program to monitor and analyze the hydrologic and water-quality conditions of 15 watersheds in Gwinnett County, Georgia by the USGS in cooperation with Gwinnett County Department of Water Resources.
20021001
20200930
publication date
None planned
-84.2721
-83.8468
34.1514
33.7676
ISO 19115 Topic Category
inlandWaters
USGS Thesaurus
surface water quality
streamflow
mathematical modeling
USGS Metadata Identifier
USGS:635041e7d34e47431c15c5c7
Geographic Names Information System
Gwinnett County
State of Georgia
None. Please see 'Distribution Info' for details.
None. Users are advised to read the dataset's metadata thoroughly to understand appropriate use and data limitations.
Brent T Aulenbach
U.S. Geological Survey, Southeast Region
Research Hydrologist
mailing address
1770 Corporate Drive Suite 500
Norcross
GA
30093
US
678-924-6626
678-924-6710
btaulenb@usgs.gov
Funded through a cooperative agreement between Gwinnett County Department of Water Resources and the U.S. Geological Survey
MacOS Big Sur, version 11.7.1; Microsoft Excel for Mac, version 16.67, 13_GwinnettCoGa_LOADESTModels.csv 143 KB, 13_GwinnettCoGa_LOADESTModelOutput.zip 117.7 MB
The reported concentration and load model R-squares, also known as the coefficient of determination, indicates the strength of the model fits and represents the fractional amount of the variance in concentration or load explained by the model. The reported estimated residual variance is an indication of the uncertainty in the load estimates.
No formal logical accuracy tests were conducted.
Dataset is considered complete for the information presented, as described in the abstract. Users are advised to read the rest of the metadata record carefully for additional details.
Regression models were fit for 12 water-quality constituents and for the 13 study watersheds that had sufficiently long records for estimating streamwater constituent loads. For models that included turbidity as an explanatory variable, a second model without turbidity terms was also fit, which was used to estimate loads when turbidity data were not available. To improve model parameter fits, separate models were fit for water years 2003-2010 and 2010-2020. Fitting models to these two shorter periods allowed the models to better capture temporal changes in the model relations and detect temporally changing trends.
Loads were estimated using an 11-parameter regression model, which was a function of discharge, base flow, season, turbidity, and time (trend) and had separate model parameters for the intercept, streamflow, and turbidity variables to account for different relations during base-flow and stormflow conditions. The optimal set of model terms was determined using a forward-stepwise regression and the corrected Akaike Information Criterion (AICc; Akaike, 1974; Burnham and Anderson, 2002). Model-term selection was done in JMP(R) statistical data analysis software version 16.1.0 (SAS Institute Inc., 1989-2021) using an ordinary least squares (OLS) fitting approach. The OLS fitting process does not explicitly handle censored water-quality data--concentrations that are below the analytical detection limit--so one-half the laboratory reporting limit was used as the concentrations for these samples for this process (Helsel, 1990; U.S. Environmental Protection Agency, 1991). The streamflow parameter was always included in the stepwise regression because it is a mandatory explanatory variable in the LOADEST model. The seasonal sine and cosine model terms worked as a single variable and as such were both included in a model even if only one term was deemed significant in the model fit. If the stepwise regression only selected the time-square term of the two time-trend terms, the linear time term was added to the model because LOADEST always fits a linear term when the second-order polynomial function is selected. Final model coefficients were determined using the USGS LOAD ESTimator software (LOADEST; Runkel and others, 2004). The adjusted maximum likelihood estimates (AMLE) algorithm (Cohn, 1988; Cohn and others, 1989, 1992) was used to fit the models and estimate loads. The AMLE approach appropriately handles censored water-quality data such that load estimates should have negligible bias (Cohn and others, 1992).
Model fitting was performed between October 29, 2021 and January 18, 2022.
References:
Akaike, H., 1974, A new look at the statistical model identification: IEEE Transactions on Automatic Control, v. 19, no. 6, p. 716-723, accessed July 11, 2020, at https://doi.org/10.1109/TAC.1974.1100705.
Burnham, K.P., and Anderson, D.R., 2002, Model selection and multimodel inference--A practical information-theoretic approach (Second Edition). Springer-Verlag, New York City, New York, ISBN: 13: 978-0387953649.
Cohn, T.A., 1988, Adjusted maximum likelihood estimation of the moments of lognormal populations from type I censored samples: U.S. Geological Survey Open-File Report 88-350, 34 p., accessed July 11, 2020, at https://doi.org/10.3133/ofr88350.
Cohn, T.A., Caulder, D.L., Gilroy, E.J., Zynjuk, L.D., and Summers, R.M., 1992, The validity of a simple statistical model for estimating fluvial constituent loads: An empirical study involving nutrient loads entering Chesapeake Bay: Water Resources Research, v. 28, no. 9, p. 2353-2363, accessed July 11, 2020, at https://doi.org/10.1029/92WR01008.
Cohn, T.A., DeLong, L.L., Gilroy, E.J., Hirsch, R.M., and Wells, D.K., 1989, Estimating constituent loads: Water Resources Research, v. 25, no. 5, p. 937-942, accessed July 11, 2020, at https://doi.org/10.1029/WR025i005p00937.
Helsel, D.R., 1990, Less than obvious--statistical treatment of data below the detection limit: Environmental Science and Technology, v. 24, no. 12, p. 1766-1774, accessed August 3, 2020, at https://doi.org/10.1021/es00082a001.
Runkel, R.L., Crawford, C.G., and Cohn, T.A., 2004, Load estimator (LOADEST)--A FORTRAN program for estimating constituent loads in streams and rivers: U.S. Geological Survey Techniques and Methods, book 4, chap. A5, 69 p., accessed July 11, 2020, at https://doi.org/10.3133/tm4A5.
SAS Institute Inc., 1989-2021, JMP, ver. 16.1.0: Cary, North Carolina.
U.S. Environmental Protection Agency [EPA], 1991, Chemical concentration data near the detection limit: U.S. Environmental Protection Agency, EPA 903-8-91-001, 4 p., accessed August 3, 2020 at https://nepis.epa.gov/Exe/ZyPURL.cgi?Dockey=9100S9OP.txt.
20220118
13_GwinnettCoGa_LOADESTModels.csv
Comma Separated Value (CSV) file containing model coefficients and statistics for the 488 regression models
Producer Defined
Model_num
Model number
Producer Defined
1
488
Site_nu
U.S. Geological Survey site number
Producer Defined
"02205865"
Sweetwater Creek at Club Drive near Lilburn, Ga.
Producer defined
"02207120"
Yellow River at Ga 124, near Lithonia, Ga.
Producer defined
"02207185"
No Business Creek at Lee Road, below Snellville, Ga.
Producer defined
"02207385"
Big Haynes Creek at Lenora Road, near Snellville, Ga.
Producer defined
"02207400"
Brushy Fork Creek at Beaver Road, nr Loganville, Ga.
Producer defined
"02208150"
Alcovy River at New Hope Road, near Grayson, Ga.
Producer defined
"02217274"
Wheeler Creek at Bill Cheek Road, near Auburn, Ga.
Producer defined
"02218565"
Apalachee River at Fence Road, near Dacula, Ga.
Producer defined
"02334480"
Richland Creek at Suwanee Dam Road, near Buford, Ga.
Producer defined
"02334578"
Level Creek at Suwanee Dam Road, near Suwanee, Ga.
Producer defined
"02334885"
Suwanee Creek at Suwanee, Ga.
Producer defined
"02335350"
Crooked Creek near Norcross, Ga.
Producer defined
"02336030"
North Fork Peachtree Creek at Graves Rd, near Doraville, Ga.
Producer defined
Watershed
Watershed names used in the associated U.S. Geological Survey Scientific Investigations Report. Watershed drainage area defined by site specified in value definition shown in parentheses
Producer Defined
Sweetwater Creek
Sweetwater Creek at Club Drive near Lilburn, Ga. (USGS site number 02205865)
Producer defined
Yellow River near Lithonia
Yellow River at Ga 124, near Lithonia, Ga. (USGS site number 02207120)
Producer defined
No Business Creek
No Business Creek at Lee Road, below Snellville, Ga. (USGS site number 02207185)
Producer defined
Big Haynes Creek
Big Haynes Creek at Lenora Road, near Snellville, Ga. (USGS site number 02207385)
Producer defined
Brushy Fork Creek
Brushy Fork Creek at Beaver Road, nr Loganville, Ga. (USGS site number 02207400)
Producer defined
Alcovy River
Alcovy River at New Hope Road, near Grayson, Ga. (USGS site number 02208150)
Producer defined
Wheeler Creek
Wheeler Creek at Bill Cheek Road, near Auburn, Ga. (USGS site number 02217274)
Producer defined
Apalachee River
Apalachee River at Fence Road, near Dacula, Ga. (USGS site number 02218565)
Producer defined
Richland Creek
Richland Creek at Suwanee Dam Road, near Buford, Ga. (USGS site number 02334480)
Producer defined
Level Creek
Level Creek at Suwanee Dam Road, near Suwanee, Ga. (USGS site number 02334578)
Producer defined
Suwanee Creek
Suwanee Creek at Suwanee, Ga. (USGS site number 02334885)
Producer defined
Crooked Creek
Crooked Creek near Norcross, Ga. (USGS site number 02335350)
Producer defined
North Fork Peachtree Creek
North Fork Peachtree Creek at Graves Rd, near Doraville, Ga. (USGS site number 02336030)
Producer defined
Constit
Chemical constituent abbreviated name
Producer Defined
TSS
Total suspended solids (USGS parameter code 00530)
Producer defined
SSC
Suspended sediment concentration (USGS parameter code 80154)
Producer defined
TN
Total nitrogen (USGS parameter code 00600)
Producer defined
DP
Dissolved phosphorus (USGS parameter code 00666)
Producer defined
TP
Total phosphorus (USGS parameter code 00665)
Producer defined
TOC
Total organic carbon (USGS parameter code 00680)
Producer defined
Ca
Total calcium (USGS parameter code 00916)
Producer defined
Mg
Total magnesium (USGS parameter code 00927)
Producer defined
TPb
Total lead (USGS parameter code 01051)
Producer defined
TZn
Total zinc (USGS parameter code 01092)
Producer defined
TDS
Total dissolved solids (USGS parameter code 70300)
Producer defined
NO3NO2
Total nitrate plus nitrite (USGS parameter code 00630)
Producer defined
Model_type
Load estimation model type
Producer Defined
QOnly
Streamflow-only load model - model does not include turbidity model parameters
Producer defined
QT
Streamflow-turbidity load model - model includes at least one of the two turbidity model parameters
Producer defined
Strt_Cdate
Start date of sampling period used to calibrate the load estimation model
Producer Defined
10/10/2002
3/10/2010
m/d/yyyy
End_CDate
End date of sampling period used to calibrate the load estimation model
Producer Defined
6/9/2010
9/29/2020
m/d/yyyy
Obs_num
Number of observations (concentrations) used in calibrating the model
Producer Defined
25
95
Number of concentrations
1
CenObs_num
Number of censored concentrations in the model calibration dataset. Censored values are concentrations that are below their analytical detection limit
Producer Defined
0
21
Number of censored concentrations
1
Outliers_n
Number of outlier concentrations that were excluded from the calibration datasets
Producer Defined
0
15
Number of outlier concentrations excluded
1
Time_step
Length of computational time step that loads were estimated to accommodate using storm composite sample concentrations
Producer Defined
4
12
hours
1
ConcModRSq
Concentration model regression coefficient of determination (model R-square) -- represents the fractional amount of the variance in concentration explained by the model
Producer Defined
-0.012
0.987
0.001
LoadModRSq
Load model regression coefficient of determination (model R-square) -- represents the fractional amount of the variance in load explained by the model
Producer Defined
0.753
0.999
0.001
ModParms_n
Number of model parameters (includes intercept, biny9J, lnQ, lnQ.BaseQ, lnQb, lnTurb, lnTurbStrm, sinDECTIME and cosDECTIME (seasonal sine and cosine parameters count as one parameter), DECTIME, and DECTIME2)
Producer Defined
2
10
Number of model parameters
1
Intercept
Model intercept. (Corresponds to model coefficient a0 in equation 1 of associated USGS Scientific Investigations Report Aulenbach and others [2023])
Producer Defined
-7.15433926
14.6890124
biny9J
Additional model intercept applied only during stormflow conditions. (Corresponds to model coefficient a1 for stormflow condition parameter S in equation 1 of associated USGS Scientific Investigations Report Aulenbach and others [2023])
Producer Defined
NA
Not applicable. Model term not included in regression model because it was not significant
Producer defined
-3.5025866
7.09068379
lnQ
Model coefficient for the natural logarithm of streamflow parameter. (Corresponds to model coefficient a2 for average streamflow parameter lnQ in equation 1 of associated USGS Scientific Investigations Report Aulenbach and others [2023])
Producer Defined
0.34135478
2.6006601
natural logarithm of streamflow in cubic feet per second
lnQ.BaseQ
Model coefficient for the natural logarithm of streamflow during base-flow conditions parameter. (Corresponds to model coefficient a3 for average streamflow during base-flow conditions parameter (1-S)lnQ in equation 1 of associated USGS Scientific Investigations Report Aulenbach and others [2023])
Producer Defined
NA
Not applicable. Model term not included in regression model because it was not significant
Producer defined
-1.41539886
1.61265958
natural logarithm of streamflow in cubic feet per second
lnQb
Model coefficient for the natural logarithm of daily base-flow. (Corresponds to model coefficient a4 for daily base-flow parameter lnQb in equation 1 of associated USGS Scientific Investigations Report Aulenbach and others [2023])
Producer Defined
NA
Not applicable. Model term not included in regression model because it was not significant
Producer defined
lnTurb
Model coefficient for the natural logarithm of turbidity parameter. (Corresponds to model coefficient a5 for average turbidity parameter lnT in equation 1 of associated USGS Scientific Investigations Report Aulenbach and others [2023])
Producer Defined
NA
Not applicable. Model term not included in regression model because it was not significant (Model_type = QOnly and has no corresponding Model_type = QT model) or was a streamflow-only flow model (Model_type = QOnly and has a corresponding Model_type = QT model)
Producer defined
-0.3502531
1.5745366
natural logarithm of turbidity in Formazin Nephelometric Units
lnTurbStrm
Model coefficient for the natural logarithm of turbidity during stormflow conditions parameter. This corresponds to the lnTurb.Storm parameter name in the LOADEST output pdf files. (Corresponds to model coefficient a6 for average turbidity during stormflow conditions parameter SlnT in equation 1 of associated USGS Scientific Investigations Report Aulenbach and others [2023])
Producer Defined
.
Not applicable. Model term not included in regression model because it was not significant (Model_type = QOnly and has no corresponding Model_type = QT model) or was a streamflow-only flow model (Model_type = QOnly and has a corresponding Model_type = QT model)
Producer defined
-0.6983179
1.19018411
natural logarithm of turbidity in Formazin Nephelometric Units
sinDECTIME
Model seasonal sine coefficient parameter. This corresponds to the sin.DECTIME parameter name in the LOADEST output pdf files. (Corresponds to model coefficient a7 for seasonal parameter sin-theta in equation 1 of associated USGS Scientific Investigations Report Aulenbach and others [2023])
Producer Defined
.
Not applicable. Model term not included in regression model because neither seasonal term was significant
Producer defined
-0.4830597
0.3092907
sine of day of year
cosDECTIME
Model seasonal cosine coefficient parameter. This corresponds to the cos.DECTIME parameter name in the LOADEST output pdf files. (Corresponds to model coefficient a8 for seasonal parameter cos-theta in equation 1 of associated USGS Scientific Investigations Report Aulenbach and others [2023])
Producer Defined
.
Not applicable. Model term not included in regression model because neither seasonal term was significant
Producer defined
-0.72929817
0.28354545
cosine of day of year
DECTIME
Model centered (linear) time trend coefficient parameter -- time is centered by subtracting the centered time Date_CV. (Corresponds to model coefficient a9 for time-trend parameter time in equation 1 of associated USGS Scientific Investigations Report Aulenbach and others [2023])
Producer Defined
.
Not applicable. Model term not included in regression model because neither Date nor Date_Sq model terms significant
Producer defined
-0.2629871
0.1279981
Decimal years
DECTIME2
Model centered time-squared trend coefficient parameter -- time is centered by subtracting the centered time Date_CV. (Corresponds to model coefficient a10 for time-trend parameter time^2 in equation 1 of associated USGS Scientific Investigations Report Aulenbach and others [2023])
Producer Defined
.
Not applicable. Model term not included in regression model because it was not significant
Producer defined
-0.17282491
0.13887025
Decimal years
Date_CV
Centered time
Producer Defined
2006.45
2015.56
Decimal years
0.01
Resid_Var
Estimated variance in the residuals to the load regression model
Producer Defined
0.00395
1.053
0.00001
SerCorrRes
Serial correlation in the model residuals
Producer Defined
-0.4318
0.5588
0.0001
TWNormTest
Turnbull-Weiss normality test statistic p-value of model residuals. A p-value of less than 0.05 indicates that the model did not have normally distributed residuals and that load estimates may not be optimal for censored data
Producer Defined
8.39e-06
0.9964
Resid_Norm
Residual normality based on the Turnbull-Weiss normality test statistic p-value
Producer Defined
Normal
Residuals could have come from a normal distribution -- Turnbull-Weiss normality test statistic p-value <=0.05
Producer defined
Not normal
Residuals are not from a normal distribution -- Turnbull-Weiss normality test statistic p-value >0.05
Producer defined
GB_Lower
Grubbs and Beck outlier test criteria lower limit -- model residuals below limit indicates possible low outlier
Producer Defined
-3.366
-2.822
Standard deviations
0.001
GB_Upper
Grubbs and Beck outlier test criteria upper limit -- model residuals above limit indicates possible high outlier
Producer Defined
2.822
3.366
Standard deviations
0.001
Resid_Low
Minimum normalized model residual
Producer Defined
-3.3
-1.225
Standard deviations
0.001
Resid_Up
Maximum normalized model residual
Producer Defined
1.227
3.487
Standard deviations
0.001
Res_OutLow
Indication of a low outlier(s) -- based on comparison of the minimum normalized model residual with the Grubbs and Beck outlier test criteria lower limit
Producer Defined
no
No low outliers identified -- Minimum normalized model residual greater than the Grubbs and Beck outlier test criteria lower limit
Producer defined
Res_OutUp
Indication of a high outlier(s) -- based on comparison of the maximum normalized model residual with the Grubbs and Beck outlier test criteria upper limit
Producer Defined
no
No high outliers identified -- Maximum normalized model residual less than the Grubbs and Beck outlier test criteria upper limit
Producer defined
yes
Possible high outlier(s) present -- Maximum normalized model residual greater than the Grubbs and Beck outlier test criteria upper limit
Producer defined
ModOut_FN
LOADEST Model Output File Name of portable document format file
Producer Defined
LOADEST Model Output File Name of portable document format file containing model statistics and plots for evaluating model fits
This dataset is a compilation of the model coefficients and statistics for the 488 load regression models that were developed for the Gwinnett County, Georgia study (Aulenbach and others 2023). Load regression models were fit for 12 water-quality constituents in 13 watersheds. These regression models were then used in conjunction with the LOADEST estimation dataset to estimate streamwater constituent loads for water years 2003 to 2020.
Regression models were fit using an 11-parameter equation, which was a function of discharge, base flow, season, turbidity, and time (trend). Models were fit for constituent-watershed combinations that had moderate to strong concentration-model relations, as indicated by concentration model R-square >0.20. (An alternative method was used to estimate loads when concentration-model relations were weak.) For each constituent-watershed combination, separate models were fit with and without turbidity based explanatory variables. Separate models were fit for water years 2003-2010 and 2010-2020 to improve model predictions. Models were fit for 12 constituents: total suspended solids (TSS), suspended sediment concentration (SSC), total nitrogen (TN), total nitrate plus nitrite (NO3+NO2), total phosphorus (TP), dissolved phosphorus (DP), total organic carbon (TOC), total calcium (Ca), total magnesium (Mg), total lead (TPb), total zinc (TZn), and total dissolved solids (TDS). Model parameter selection was done using a forward stepwise regression. Final model coefficients were determined from the USGS LOADEST load estimation software (Runkel and others, 2004).
The dataset includes a summary of sample concentrations used to calibrate the models including the period of samples used to calibrate the model, the number of samples, the number of samples with censored concentrations, and the number of outliers excluded from the calibration dataset. The dataset also includes the model coefficients. Model coefficients assume that loads are estimated in pounds per day, flows are in cubic feet per second, turbidity is in Formazin Nephelometric Units, seasonal term are in day of year, and time trend terms are in centered decimal year. Model coefficients will not result in exact estimates of loads due to the incorporation of a LOADEST algorithm applied to account for retransformation bias of the logarithmic model transformed back to linear space.
Several model statistics are provided in the dataset. Model R-squares, also known as the coefficient of determination, indicates the strength of the model fit and represents the fractional amount of the variance in concentration or load explained by the model.
Several model statistics apply to the model residuals, which represent the unexplained variance (error) in the model and was calculated as the observed minus the predicted loads. The serial correlation of the residuals indicates the degree for which errors are independent in time. The presence of high serial correlation can result in underestimating the uncertainty in load estimates (Aulenbach, 2013). The estimated residual variance is an indication of the uncertainty in the load estimates. The Turnbull-Weiss normality test statistic (Turnbull and Weiss, 1978) of the residuals indicates whether the residuals are normally distributed. Non-normal residuals can bias the model parameter fit and the AMLE approach may not provide optimal load estimates for censored data (Runkel and others, 2004) and the expectation of normally distributed errors would not be correct. However, McCulloch and Neuhaus (2012) indicated that ordinary least squares parameter estimates were still robust for non-normal distributions that exhibited skew (unsymmetrical) or kurtosis (heavy-tailed) but were still unimodal. The Grubbs and Beck outlier test criteria along with the range of normalized residuals was used to identify possible outliers, with possible outliers having residuals falling outside the test criteria range. However, the Grubbs and Beck outlier test criteria reported herein are after removing any outliers. This reduces the model variance and the Grubbs and Beck criteria range, resulting in a tendency for additional sample concentrations to be identified as outliers. Hence, iterative use of this criteria can result in excessive removal of reasonable data and reduce uncertainty estimates such that they are not representative. This criterion was only used as a guide for outlier identification along with other evidence.
Portable document format files of LOADEST outputs are provided for each model in a “zip” file that contain model coefficients, statistics, and plots for evaluating model fits. Model coefficients and the most pertinent model statistics are compiled in the model summary. The residual plots allow for additional analysis to assess model fits. Model residuals should be random--having about equal variance both above and below the y-axis and across the range of values (also known as being identically distributed)--when plotted versus observed load and the model’s explanatory variables (streamflow, turbidity, and day of year). Unequal variance above and below the y-axis indicates that the model is biased over the range of conditions that this occurs. Unequal variance across the range of values indicates that errors are higher or lower than the overall estimate of uncertainty during those conditions. Unequal variance may indicate that the model is not well-posed and may need additional transformations to linearize the variable relations with concentrations or require additional explanatory variables to explain variations in concentration.
Further details on model development and how to assess model fits are available in Aulenbach and others (2022, 2023).
Aulenbach, B.T., 2013, Improving regression-model-based streamwater constituent load estimates derived from serially correlated data: Journal of Hydrology, v. 503, p. 55-66, accessed July 11, 2020, at https://doi.org/10.1016/j.jhydrol.2013.09.001.
Aulenbach, B.T., Henley, J.C., and Hopkins, K.G., 2023, Hydrology, water-quality, and watershed characteristics in 15 watersheds in Gwinnett County, Georgia, water years 2002-20: U.S. Geological Survey Scientific Investigations Report 2023-5035.
Aulenbach, B.T., Kolb, K., Joiner, J.K., and Knaak, A.E., 2022, Hydrology and water quality in 15 watersheds in DeKalb County, Georgia, 2012-16: U.S. Geological Survey Scientific Investigations Report 2021-5126, 105 p., accessed August 8, 2022, at https://doi.org/10.3133/sir20215126.
McCulloch, C.E., and Neuhaus, J.M., 2012, Misspecifying the shape of a random effects distribution--Why getting it wrong may not matter: Statistical Science, v. 26, no. 3, p. 388-402, accessed April 20, 2020, at https://doi.org/10.1214/11-STS361.
Runkel, R.L., Crawford, C.G., and Cohn, T.A., 2004, Load estimator (LOADEST)--A FORTRAN program for estimating constituent loads in streams and rivers: U.S. Geological Survey Techniques and Methods, book 4, chap. A5, 69 p., accessed July 11, 2020, at https://doi.org/10.3133/tm4A5.
Turnbull, B.W., and Weiss, L., 1978, A likelihood ratio statistic for testing goodness of fit with randomly censored data: Biometrics, v. 34, p. 367-375, accessed July 11, 2020, at https://doi.org/10.2307/2530599.
U.S. Geological Survey
GS ScienceBase
mailing address
Denver Federal Center, Building 810, Mail Stop 302
Denver
CO
80225
United States
1-888-275-8747
sciencebase@usgs.gov
Unless otherwise stated, all data, metadata and related materials are considered to satisfy the quality standards relative to the purpose for which the data were collected. Although these data and associated metadata have been reviewed for accuracy and completeness and approved for release by the U.S. Geological Survey (USGS), no warranty expressed or implied is made regarding the display or utility of the data on any other system or for general or scientific purposes, nor shall the act of distribution constitute any such warranty.
Digital Data
https://doi.org/10.5066/P9G8HZTY
None
20230404
Brent T Aulenbach
U.S. Geological Survey, Southeast Region
Research Hydrologist
mailing address
1770 Corporate Drive Suite 500
Norcross
GA
30093
US
678-924-6626
678-924-6710
btaulenb@usgs.gov
FGDC Content Standard for Digital Geospatial Metadata
FGDC-STD-001-1998