Data used to model and map manganese in the Northern Atlantic Coastal Plain aquifer system, eastern USA

DeSimone, L.A. Ransom, K.M. 20211018 Data used to model and map manganese in the Northern Atlantic Coastal Plain aquifer system, eastern USA comma-delimited text and tif-format raster files Reston, VA U.S. Geological Survey https://doi.org/10.5066/P9M64CD1 Leslie A. DeSimone Katherine M. Ransom 202110 Manganese in the Northern Atlantic Coastal Plain aquifer system, eastern USA—Modeling regional occurrence with pH, redox, and machine learning publication Journal of Hydrology: Regional Studies vol. 37 n/a Elsevier BV ppg. 100925 https://doi.org/10.1016/j.ejrh.2021.100925 Data used to model and map manganese concentrations in groundwater in the Northern Atlantic Coastal Plain (NACP) aquifer system, eastern USA, are documented in this data release. The model predicts manganese concentration within four classes and is based on concentration data from 4492 wells. The well data were compiled from U.S. Geological Survey, U.S. Environmental Protection Agency, Suffolk County Water Authority (Suffolk County, New York), and state agency sources. The four concentration classes are based on guidelines for drinking water quality: below detection (class 1, less than 10 micrograms per liter (ug/L)); detected but less than the aesthetic guideline of 50 ug/L (class 2); greater than the aesthetic guideline but less than the health guideline of 300 ug/L (class 3); and greater than the health guideline of 300 ug/L (class 4). The thresholds of 50 ug/L and 300 ug/L are a Secondary Maximum Contaminant Level and a lifetime health advisory, respectively, from the U.S. Environmental Protection Agency for public water supplies. The model is built with the XGboost machine learning method. Explanatory variables (predictors) include well depth, soil characteristics, hydrologic variables, groundwater residence time, and predicted values of pH and of the probability of low dissolved oxygen from previous machine learning models of the aquifer system. The data are provided in data tables, raster files, and model files, organized as follows. One data table describes the 27 explanatory variables used in the model (NACP_Mn_explanatory_variables.csv). There is a data table for the well data used to develop the models, which includes the manganese concentrations, concentration classes, regional aquifer, explanatory variables, and predicted concentration class for the wells (NACP_Mn_well_data.csv). There is a compressed group (zip file) of 10 files (one for each regional aquifer) for explanatory variable data used to make predictions for the regional aquifers (NACP_Mn_prediction_input_aquifers.zip). There are two zip files providing model output, one for predictions made for each aquifer in text format and one for tif-format rasters of predictions for each aquifer. The data release also contains a tif-format raster file of the prediction grid and a zip file with the model object file (R data format) and a script that can be used to run the model to produce the predictions provided in this data release. Filenames for prediction input and for model output are distinguished by codes abbreviating the aquifer name and position in the vertical stack of 19 regional aquifers and confining units, as follows: Surficial aquifer, 1surf; Upper Chesapeake aquifer, 3upch; Lower Chesapeake aquifer, 5loch; Piney Point aquifer, 7pipt; Aquia aquifer, 9aqia; Monmouth - Mt. Laurel Aquifer, 11moml; Matawan aquifer, 13mtwn; Magothy Aquifer, 15mgty; Potomac-Patapsco aquifer, 17popt; Potomac-Patuxent aquifer, 19popx. The nine confining units are not represented in the model or predictions. These data were compiled to model and map manganese concentration in the NACP aquifer system. Manganese is an emerging health concern and a common nuisance contaminant in groundwater sources of drinking water. The NACP aquifer system is an important water supply source, providing drinking water in a densely populated region along the eastern coast of the United States. Manganese is the most frequently occurring geogenic contaminant in the aquifer system. Models and maps of manganese concentrations in the NACP aquifer system can be used to identify areas where concentrations may be greatest and where future monitoring may be prioritized. 2021 publication date Complete None planned -78.2616 -71.1163 41.3556 34.5269 USGS Thesaurus groundwater quality None manganese regional groundwater quality machine learning XGboost class imbalance NAWQA USGS Metadata Identifier USGS:6017f3c0d34edf5c66ef8bfe Getty Thesaurus of Geographic Names New York New Jersey Delaware Maryland Virginia North Carolina Delmarva Peninsula Long Island None Northern Atlantic Coastal Plain aquifer system Surficial aquifer Upper Chesapeake aquifer Lower Chesapeake aquifer Piney Point aquifer Aquia aquifer Monmouth - Mt. Laurel aquifer Matawan aquifer Magothy aquifer Potomac - Patapsco aquifer Potomac - Patuxent aquifer none none Leslie A Desimone New England Water Science Center Hydrologist mailing and physical

10 Bearfoot Road

Northborough MA 01532 US 508-490-5023 ldesimon@usgs.gov Funding provided by the National Water-Quality Assessment (NAWQA) project No formal attribute accuracy tests were conducted No formal logical accuracy tests were conducted Data set is considered complete for the information presented, as described in the abstract. Users are advised to read the rest of the metadata record carefully for additional details. No formal positional accuracy tests were conducted No formal positional accuracy tests were conducted See separate file, "explanatory_variables.csv", for a list of publications referenced in this data release 2021 NACP_Mn_explanatory_variables.csv Various Various Various Various Various 20030101 20201231 Source input data were current as described by their publication or access dates Various Source data used to create explanatory variables A 1-square kilometer grid for the study area was created from the national-scale 1-square kilometer grid (Clark and others, 2018) for use in prediction and in mapping study results, as described in DeSimone and Pope (2020). This study-area raster is available in this data release in the file "NACP_Mn_prediction_grid.zip". References: Clark, B.R., Barlow, P.M., Peterson, S.M., Hughes, J.D., Reeves, H.W., and Viger, R.J., 2018, National-scale grid to support regional groundwater availability studies and a national hydrogeologic database: U.S. Geological Survey data release, https://doi.org/10.5066/F7P84B24 DeSimone, L.A., and Pope, J.P., 2020, Data used to model and map pH and redox conditions in the Northern Atlantic Coastal Plain aquifer system, eastern USA: U.S. Geological Survey data release, https://doi.org/10.5066/P94DYERF 2018 Manganese concentrations from well water were compiled from the U.S. Geological Survey (USGS) National Water Information System (NWIS) database, the U.S. Environmental Protection Agency Safe Drinking Water Information System (SDWIS) database, several state agencies, and the Suffolk County Water Authority (SCWA). Data from NWIS (76% of data), SDWIS (13%), and State agencies (<1%) were compiled from these three sources as part of a national data aggregation by the USGS, in support of multiple water-quality investigations of the National Water Quality Assessment Project (Erickson and others, 2019). Data from the national aggregation were from samples collected between 1988 and 2018 for heterogeneous purposes from various well types. Data from the national aggregation were augmented on Long Island, NY, by data from the Suffolk County Water Authority (SCWA; 11% of the entire well data set). SCWA data were primarily from public supply wells and were all from samples collected in 2002. Reference: Erickson, M. L., Yager, R.M., Kauffman, L.J., and Wilson, J.T., 2019, Drinking water quality in the glacial aquifer system northern USA: Science of the Total Environment, v. 694, https://doi.org/10.1016/j.scitotenv.2019.133735 2020 Explanatory variable data were obtained from various sources as described in the separate file "NACP_Mn_explanatory_variables.csv." 2020 Wells were associated with aquifers using well screen elevations and ancillary information on aquifer from the source database. Aquifer was not used as an explanatory variable in the model but was used to attribute other explanatory variables that varied by aquifer to wells and to categorize water-quality data from individual aquifers in the aquifer system. A python script was used to determine the intersections of well screen elevations and the top and bottom elevations of the 19 hydrogeologic units (aquifers and confining units). Hydrogeologic unit elevations were as represented in the regional groundwater flow model (MODFLOW) of Masterson and others (2016), modified from Pope and others (2016). Well screen top and bottom elevations were calculated from well minimum and maximum open interval depths in the data sources and land surface elevation from Pope and others (2016). Any missing well screen bottom elevations were set to well depth, if available. Any missing screen top elevation was set to land surface. The python script output and ancillary information was manually reviewed to determine the final aquifer designation. Wells with screened intervals that fell within the top and bottom elevation of an aquifer hydrogeologic units were designated as in that aquifer. As expected, not all regional-scale data on unit occurrence and elevations and the local-scale data for wells agreed. Wells with screens that intersected an aquifer unit but also extended into an adjacent confining unit were designated with that aquifer, especially if ancillary information on the aquifer from the source data base agreed. Wells were omitted for several reasons: if both well depth and depth to maximum open interval were missing; if well screens were entirely within confining units and there was no ancillary aquifer information; if well screens spanned more than one aquifer separated by a confining unit; or if wells were missing a screen top elevation and the aquifer at the bottom of the wells screen was not the aquifer indicated by the ancillary information from the source database. References: Masterson, J.P., Pope, J.P., Fienen, M.N., Monti, Jack Jr., Nardi, M.R., and Finkelstein, J.S., 2016, Documentation of a groundwater flow model developed to assess groundwater availability in the Northern Atlantic Coastal Plain aquifer system from Long Island, New York, to North Carolina (ver. 1.1, December 2016): U.S. Geological Survey Scientific Investigations Report 2016–5076, 70 p., http://dx.doi.org/10.3133/sir20165076. Pope, J.P., Andreasen, D.C., Mcfarland, E.R., and Watt, M.K., 2016, Digital elevations and extents of regional hydrogeologic units in the Northern Atlantic Coastal Plain aquifer system from Long Island, New York, to North Carolina: U.S. Geological Survey Data Series 996, 28 p., http://dx.doi.org/10.3133/ds996. 2020 Distance from the Fall Zone (dist_fallzone) was calculated using the ArcMap (v. 10.6.1) “Near” Analysis tool. This distance represents the shortest distance to the western boundary of the study area (the Fall Zone) or, on Long Island, the distance to a line representing the maximum water table of Como and others (2015). Reference: Como, M.D., Noll, M.L., Finkelstein, J.S., Monti, J., Jr., and Busciolano, R., 2015, Water-table and potentiometric-surface altitudes in the Upper Glacial, Magothy, and Lloyd aquifers of Long Island, New York, April–May 2013: U.S. Geological Survey Scientific Investigations Map 3326, http://dx.doi.org/10.3133/sim3326. 2020 For the wetland variable (well_all), separate data sets for individual states in the study area (NY, NJ, DE, NC, MD, PA, and VA) were clipped to the study and combined using data analysis tools of ArcMap (v. 10.6.1). Areas of the following wetland classes of U.S. Fish and Wildlife Service (2019) were combined into a single spatial variable to represent wetland area: estuarine and marine wetland, freshwater and emergent wetland, freshwater forested/shrub, freshwater pond, riverine, lake, and other. Reference: U.S. Fish and Wildlife Service. 2019. National Wetlands Inventory website. U.S. Department of the Interior, Fish and Wildlife Service, Washington, D.C. http://www.fws.gov/wetlands/, accessed March 14, 2020 2020 Explanatory variables representing land-surface characteristics (bfi, DSD1, DSD5, DSD9, evapot, fert_N, LP1, LP5, LP9, ls_elev, recharge, soil_clay_pct, soil_sand_pct, soil_silt_pct, wet_all) were attributed to wells and to prediction grid points using either a point extraction or an aggregation of values within 500-meter circular buffer areas surrounding the well or grid point locations. Point extractions were done in ESRI’s ArcMap software (v. 10.6.1) using the “Identity” Analysis tool for vector data layers or the “Extract Multi Values to Points” Spatial Analyst tool for raster data. Aggregation of values within buffer areas was done for raster data only. A python script subdivided source-data raster cells into smaller areas and then aggregated values within the buffer areas (Clark, B.R., Knierim, K.J., and Duncan, L., 2019, zonepy, version 0.0, https://github.com/brclark-usgs/zonepy). Area-weighted means (numeric or percentage values) were then calculated for the buffer areas. Reference: Clark, B.R., Knierim, K.J., and Duncan, L., 2019, zonepy, version 0.0, https://github.com/brclark-usgs/zonepy 2020 Variables from the groundwater flow model and groundwater residence times (age) (age_10_pctl, age_50th_pctl, age_90th_pctl, aq_hk, hk_thk, gw_flux, gw_head) were attributed to wells by setting them equal to the values of the 1-square-mile (1.6 square-kilometer) MODFLOW groundwater flow model grid cell in which the well was located (Masterson and others (2016) and for the model layer corresponding to the aquifer designation of the well. Flow model and residence time variables were attributed to prediction grid points using ArcGIS to intersect the 1-square-mile model grid cell with the prediction grid and then calculating in R the area-weighted averages of MODFLOW variables within prediction grid cells, for each aquifer. Reference: Masterson, J.P., Pope, J.P., Fienen, M.N., Monti, Jack Jr., Nardi, M.R., and Finkelstein, J.S., 2016, Documentation of a groundwater flow model developed to assess groundwater availability in the Northern Atlantic Coastal Plain aquifer system from Long Island, New York, to North Carolina (ver. 1.1, December 2016): U.S. Geological Survey Scientific Investigations Report 2016–5076, 70 p., http://dx.doi.org/10.3133/sir20165076. 2020 The variables of predicted pH (pred_ph) and the predicted probability of dissolved oxygen less than 2 milligrams per liter (pred_do2) were attributed to wells by setting them equal to the values of the 1-square-kilometer prediction grid and aquifer in which the well was located. The same prediction grid used in the present study was used to create the pred_ph and pred_do2 variables; thus, these variables were assigned to prediction grid points using their source information (DeSimone and Pope, 2020) Reference: DeSimone, L.A., Pope, J.P., 2020, Data used to model and map pH and redox conditions in the Northern Atlantic Coastal Plain aquifer system, eastern USA. U.S. Geological Survey data release. https://doi.org/10.5066/P94DYERF. 2020 Separate data files of explanatory variables were created for each of the 10 regional aquifers in the aquifer system. Variables representing surficial and other characteristics that did not vary by aquifer were the same in all 10 files. Variables that varied by aquifer, including variables based on the groundwater flow model, groundwater residence time, and predicted pH and dissolved oxygen probability, were set to the values appropriate for the aquifer layer. The variables of well depth and screen bottom elevation for each aquifer were set equal to the values corresponding to the vertical midpoint of each aquifer at each grid cell. The predevelopment water table, as simulated by the groundwater flow model, was used as the top of the surficial aquifer for these calculations. 2020 The XGboost method (Chen and Guestrin, 2016; Chen and others, 2020) was used to fit a machine learning model to the well data, as described in DeSimone and Ransom (2021). The resulting model is a classification model that predicts manganese concentration within four classes: less than 10 micrograms per liter (ug/L), 10 to 50 ug/L, 50 to 300 ug/L, and greater than 300 ug/L. The model was used to make predictions of manganese concentration class for the wells, as part of model development, to determine model fit. The model was then used to make predictions across the study area at the prediction grid points for each of the 10 regional aquifers. References: Chen, T., and Guestrin, C., 2016. XGBoost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/2939672.2939785. Chen, T., He., T., Benesty, M., Guestrin, C., 2020. Package ‘xgboost’. Accessed October 16, 2020 at https://cran.r-project.org/web/packages/xgboost/xgboost.pdf. DeSimone, L.A., and Ransom, K.M., 2021, Manganese in the Northern Atlantic Coastal Plain aquifer system, eastern USA—Modeling regional occurrence with pH, redox, and machine learning, in prep [will update before publishing this data release]. 2020 Tif-format raster files of predictions were created in R by substituting the predicted value for the grid cell identification variable (gridcode) in the study-area raster of the national grid raster. 2020 See the metadata describing individual data tables and raster files for spatial data organization Raster Albers Conical Equal Area 29.5 45.5 -96.0 23.0 0.0 0.0 coordinate pair 1000.0 1000.0 METERS WGS84 WGS84 6378137.0 298.257223563 NACP_Mn_explanatory_variables.csv Tabular, comma-delimited file defining the explanatory variables (predictors) listed in the well and prediction input files and used in the model Producer defined Variable_name Name of the explanatory variable, corresponding to column headings in well and prediction input files and to variable names in models Producer defined A short name for the explanatory variable Description A short description of the explanatory variable Producer defined Text defining the explanatory variable Varies_by_aquifer Describes whether predictor is different for each of the 10 regional aquifers or is the same for all aquifers Producer defined yes The predictor is different for each of the 10 regional aquifers Producer defined no The predictor is the same for all of the regional aquifers Producer defined Units Lists the units of the variable Producer defined Text describing variable units Time_period Describes the time period associated with the variable Producer defined The time period is listed for variables that are associated with specific time periods and "NA" is listed for variables not associated with a specific time period Method_of_attribution Method of attributing variable to well or prediction grid point Producer defined See the "Data Quality" section of this data release and the supplementary information of DeSimone and others (2020) for explanation of attribution processes Reference: DeSimone, L.A., Pope, J.P., Ransom, K.M. 2020. Machine learning models of pH and dissolved oxygen in the Northern Atlantic Coastal Plain aquifer system, eastern USA. J. Hydrol. Reg. Studies. 30, 100697. https://doi.org/10.1016/j.ejrh.2020.100697 Source_short_citations Short citation or citations for the data source, separated by semi-colons Producer defined Author(s) and publication date for the data source(s) Source_full_citation1 Complete reference for the first (or only) data source Producer defined Author(s), publication date, title, publisher, and online link for the data source Source_full_citation2 Complete reference for the second data source, if there is one Producer defined Author(s), publication date, title, publisher, and online link for the data source if there is a second data source; otherwise "NA" NACP_Mn_well_data.csv Tabular, comma-delimited file containing well data used in model development, including manganese concentrations and concentration classes, aquifer, explanatory variables, whether the well was used to train or test the model, and model predictions. Model predictions consist of the predicted probability of membership in each of the four concentration classes and the final predicted class for each well. Data columns for the explanatory variables are described in a separate file, "NACP_Mn_explanatory_variables.csv" Producer defined well_seqno A sequential numeric identifier the well, that also indicates sorting for the well data file when used as input to the model Producer defined 1 4492 unitless STAID Well identifier Producer defined A numeric or alphanumeric identifier that uniquely identifies the well. data_source Source of the well data Producer defined NWIS U.S. Geological Survey National Water Information System database Producer defined SDWIS U.S. Environmental Protection Agency Safe Drinking Water Information System database Producer defined GWAM State agency databases Producer defined SCWA Suffolk County Water Authority Producer defined aquifer Name of the regional aquifer from which the well withdrawals water Producer defined 1_Surficial Surficial aquifer, layer 1 from the surface in the vertical stack of 19 regional aquifers and intervening confining units Producer defined 3_UpperChesapeake Upper Chesapeake aquifer, layer 3 from the surface in the vertical stack of 19 regional aquifers and intervening confining units Producer defined 5_LowerChesapeake Lower Chesapeake aquifer, layer 5 from the surface in the vertical stack of 19 regional aquifers and intervening confining units Producer defined 7_PineyPoint Piney Point aquifer, layer 7 from the surface in the vertical stack of 19 regional aquifers and intervening confining units Producer defined 9_Aquia Aqia aquifer, layer 9 from the surface in the vertical stack of 19 regional aquifers and intervening confining units Producer defined 11_MonmouthMtLaurel Monmouth - Mt. Laurel aquifer, layer 11 from the surface in the vertical stack of 19 regional aquifers and intervening confining units Producer defined 13_Matawan Matawan aquifer, layer 13 from the surface in the vertical stack of 19 regional aquifers and intervening confining units Producer defined 15_Magothy Magothy aquifer, layer 15 from the surface in the vertical stack of 19 regional aquifers and intervening confining units Producer defined 17_PotomacPatapsco Potomac-Patapsco aquifer, layer 17 from the surface in the vertical stack of 19 regional aquifers and intervening confining units Producer defined 19_PotomacPatuxent Potomac-Patuxent aquifer, layer 19 from the surface in the vertical stack of 19 regional aquifers and intervening confining units Producer defined NA No aquifer is associated with the well Producer defined mn_rmk Remark value for manganese concentration Producer defined < Less than Producer defined NA No remark Producer defined mn_val Value for manganese concentration Producer defined 0.001 75100 Micrograms per liter predprob_class1 Model-predicted probability that manganese concentrations at the well are within the range of class 1 (less than 10 micrograms per liter) Producer defined 0 1 unitless predprob_class2 Model-predicted probability that manganese concentrations at the well are within the range of class 2 (10 to 50 micrograms per liter) Producer defined 0 1 unitless predprob_class3 Model-predicted probability that manganese concentrations at the well are within the range of class 3 (50 to 300 micrograms per liter) Producer defined 0 1 unitless predprob_class4 Model-predicted probability that manganese concentrations at the well are within the range of class 4 (greater than 300 micrograms per liter) Producer defined 0 1 unitless predclass Model-predicted concentration class for manganese concentrations at the well. This is the class that had the highest predicted probability of the four classes. Producer defined 1 class 1: less than 10 micrograms per liter Producer defined 2 class 2: 10 to 50 micrograms per liter Producer defined 3 class 3: 50 to 300 micrograms per liter Producer defined 4 class 4: greater than 300 micrograms per liter Producer defined Attributes consisting of the explanatory variables as defined in a separate file These attributes are the 27 explanatory variables (predictors) used to develop the model and make predictions. They are defined and described in a separate file, "NACP_Mn_explanatory_variables.csv" (columns "age_10th_pctl" through "soil_sand_pct"). Producer defined See the separate file, "NACP_Mn_explanatory_variables.csv" data_type Field that indicates whether the well was used for training or testing the model. Producer defined Train The well was used to train the model. Producer defined Test The well was used to test the model. Producer defined NACP_Mn_prediction_input_aquifers.zip A zip file containing 10 tabular, comma-delimited files of input data used to predict manganese concentration class for the 10 regional aquifers in the study area. Each file contains a grid cell identifier attribute (gridcode) and the explanatory variable data used to predict manganese concentration class probability at the central nodal points of a 1-square-kilometer grid across the study area. The filenames consist of "nacp_mn_predgridin_" followed by a code for the aquifer. The codes for the aquifers are listed in the Abstract. Column headings are described in a separate file, "NACP_Mn_explanatory_variables.csv" Producer defined gridcode Number identifying the grid cell from the national grid source data layer, described in the first process step of this metadata. Producer defined This attribute corresponds to the cell value in the national 1-km raster, described in the first process step of this metadata. Attributes consisting of the explanatory variables as defined in a separate file These attributes are the 27 explanatory variables used to develop the model and make predictions. They are defined and described in a separate file, "NACP_Mn_explanatory_variables.csv". Producer defined See the separate file, "NACP_Mn_explanatory_variables.csv" NACP_Mn_prediction_output_aquifers_data.zip A zip file containing 10 tabular, comma-delimited files and of the predicted probability in the 10 regional aquifers in the study area. Each csv file contains a grid cell identifier (gridcode) and the 6 model output values at the central nodal points of a 1-square-kilometer grid across the study area. The raster tif files correspond spatially to the prediction grid and each tif file has predicted pH as its value. The filenames consist of "predout_ph_" (for csv files) or "predras_ph_" (for tif files) followed by a code for the aquifer. The codes for the aquifers are listed in the Abstract. Producer defined gridcode Sequential number identifying the grid cell of the prediction grid for which the prediction is made Producer defined This attribute corresponds to the cell value in the national raster. predprob_class1_aqX, where "X" is a number designating the aquifer for which the prediction is made Model-predicted probability that manganese concentrations at the grid cell point are within the range of class 1 (less than 10 micrograms per liter) Producer defined 0 1 unitless predprob_class2_aqX, where "X" is a number designating the aquifer for which the prediction is made Model-predicted probability that manganese concentrations at the grid cell point are within the range of class 2 (10 to 50 micrograms per liter) Producer defined 0 1 unitless predprob_class3_aqX, where "X" is a number designating the aquifer for which the prediction is made Model-predicted probability that manganese concentrations at the grid cell point are within the range of class 3 (50 to 300 micrograms per liter) Producer defined 0 1 unitless predprob_class4_aqX, where "X" is a number designating the aquifer for which the prediction is made Model-predicted probability that manganese concentrations at the grid cell point are within the range of class 4 (greater than 300 micrograms per liter) Producer defined 0 1 unitless predclass_aqX, where "X" is a number designating the aquifer for which the prediction is made Model-predicted concentration class for manganese concentrations at the grid cell point. This is the class that had the largest predicted probability of the four classes. Producer defined 1 class 1: less than 10 micrograms per liter Producer defined 2 class 2: 10 to 50 micrograms per liter Producer defined 3 class 3: 50 to 300 micrograms per liter Producer defined 4 class 4: greater than 300 micrograms per liter Producer defined predprob_predclass_aqX, where "X" is a number designating the aquifer for which the prediction is made Model-predicted probability of the predicted class at the grid cell point. This is the predicted probability for the class that had the largest predicted probability of the four classes. Producer defined 0 1 unitless NACP_Mn_prediction_output_aquifers_rasters.zip A zip file containing tif-format raster files of model predictions for the 10 regional aquifers in the study area. There are 10 files, one for each aquifer, for each of 6 model output values at the central nodal points of a 1-square-kilometer grid across the study area. The following terms in the filenames are used to designate the type of model output in the file: predclass, predprob_class1, predprob_class2, predprob_class3, predprob_class4, and predprob_predclass. These model output values are defined as follows. Predclass is the model-predicted concentration class at the grid point. Predprob_class1, predprob_class2, predprob_class3, and predprob_class4 are the model-predicted probability of membership in concentration classes 1 through 4, respectively, at the grid point. Predprob_predclass is the model-predicted probability of the predicted class at the grid point. Each file name also has by a code for the aquifer, as listed in the Abstract. Producer defined Raster values correspond to one of six types of model output values; there are no other attributes. Raster values correspond to one of six types of model output values, indicated by the file name, as follows. Predclass is the model-predicted concentration class at the grid point. Predprob_class1, predprob_class2, predprob_class3, and predprob_class4 are the model-predicted probability of membership in concentration classes 1 through 4, respectively, at the grid point. Predprob_predclass is the model-predicted probability of the predicted class at the grid point. Producer defined See the Entity Type and Attribute descriptions NACP_Mn_prediction_grid.zip Raster tif file of 1-square-kilometer grid used for prediction across the study area and a file, "gridcode_pred.csv", containing the gridcodes of cells for which predictions were made within the study area. Clark and others (2018), clipped to the study area Reference: Clark, B.R., Barlow, P.M., Peterson, S.M., Hughes, J.D., Reeves, H.W., and Viger, R.J., 2018, National-scale grid to support regional groundwater availability studies and a national hydrogeologic database: U.S. Geological Survey data release, https://doi.org/10.5066/F7P84B24 gridcode (the raster value in tif files, not a named attribute) Sequential number identifying the grid cell of the prediction grid (Clark and others, 2018) for which the prediction is made Clark and others (2018) 8111983 12070911 NACP_Mn_model.zip A zip file containing the model object file, in R data format, and an R script used to read and run the model to make predictions. The zip file also includes a text README file providing information on software and systems used to develop and run the model. Producer defined Files in the zip file are described in a text README file; there are no data table attributes. See the text README file within the zip file for detailed description of zip file contents; there are no data table attributes. Producer defined See the Entity Type and Attribute descriptions GS ScienceBase U.S. Geological Survey mailing and physical

Denver Federal Center, Building 810, Mail Stop 302

Denver CO 80225 United States 1-888-275-8747 sciencebase@usgs.gov Unless otherwise stated, all data, metadata and related materials are considered to satisfy the quality standards relative to the purpose for which the data were collected. Although these data have been processed successfully on a computer system at the USGS, no warranty expressed or implied is made regarding the display or utility of the data for other purposes, nor on all computer systems, nor shall the act of distribution constitute any such warranty. The USGS or the U.S. Government shall not be held liable for improper or incorrect use of the data described and/or contained herein. Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government. 20211018 Leslie A Desimone New England Water Science Center Hydrologist mailing and physical

10 Bearfoot Road

Northborough MA 01532 US 508-490-5023 ldesimon@usgs.gov Content Standard for Digital Geospatial Metadata FGDC-STD-001-1998