<?xml version='1.0' encoding='UTF-8'?>
<metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <idinfo>
    <citation>
      <citeinfo>
        <origin>DeSimone, L.A.</origin>
        <origin>Ransom, K.M.</origin>
        <pubdate>20211018</pubdate>
        <title>Data used to model and map manganese in the Northern Atlantic Coastal Plain aquifer system, eastern USA</title>
        <geoform>comma-delimited text and tif-format raster files</geoform>
        <pubinfo>
          <pubplace>Reston, VA</pubplace>
          <publish>U.S. Geological Survey</publish>
        </pubinfo>
        <onlink>https://doi.org/10.5066/P9M64CD1</onlink>
        <lworkcit>
          <citeinfo>
            <origin>Leslie A. DeSimone</origin>
            <origin>Katherine M. Ransom</origin>
            <pubdate>202110</pubdate>
            <title>Manganese in the Northern Atlantic Coastal Plain aquifer system, eastern USA—Modeling regional occurrence with pH, redox, and machine learning</title>
            <geoform>publication</geoform>
            <serinfo>
              <sername>Journal of Hydrology: Regional Studies</sername>
              <issue>vol. 37</issue>
            </serinfo>
            <pubinfo>
              <pubplace>n/a</pubplace>
              <publish>Elsevier BV</publish>
            </pubinfo>
            <othercit>ppg. 100925</othercit>
            <onlink>https://doi.org/10.1016/j.ejrh.2021.100925</onlink>
          </citeinfo>
        </lworkcit>
      </citeinfo>
    </citation>
    <descript>
      <abstract>Data used to model and map manganese concentrations in groundwater in the Northern Atlantic Coastal Plain (NACP) aquifer system, eastern USA, are documented in this data release. The model predicts manganese concentration within four classes and is based on concentration data from 4492 wells. The well data were compiled from U.S. Geological Survey, U.S. Environmental Protection Agency, Suffolk County Water Authority (Suffolk County, New York), and state agency sources. The four concentration classes are based on guidelines for drinking water quality: below detection (class 1, less than 10 micrograms per liter (ug/L)); detected but less than the aesthetic guideline of 50 ug/L (class 2); greater than the aesthetic guideline but less than the health guideline of 300 ug/L (class 3); and greater than the health guideline of 300 ug/L (class 4). The thresholds of 50 ug/L and 300 ug/L are a Secondary Maximum Contaminant Level and a lifetime health advisory, respectively, from the U.S. Environmental Protection Agency for public water supplies. The model is built with the XGboost machine learning method. Explanatory variables (predictors) include well depth, soil characteristics, hydrologic variables, groundwater residence time, and predicted values of pH and of the probability of low dissolved oxygen from previous machine learning models of the aquifer system. 

The data are provided in data tables, raster files, and model files, organized as follows. One data table describes the 27 explanatory variables used in the model (NACP_Mn_explanatory_variables.csv). There is a data table for the well data used to develop the models, which includes the manganese concentrations, concentration classes, regional aquifer, explanatory variables, and predicted concentration class for the wells (NACP_Mn_well_data.csv). There is a compressed group (zip file) of 10 files (one for each regional aquifer) for explanatory variable data used to make predictions for the regional aquifers (NACP_Mn_prediction_input_aquifers.zip). There are two zip files providing model output, one for predictions made for each aquifer in text format and one for tif-format rasters of predictions for each aquifer. The data release also contains a tif-format raster file of the prediction grid and a zip file with the model object file (R data format) and a script that can be used to run the model to produce the predictions provided in this data release. 

Filenames for prediction input and for model output are distinguished by codes abbreviating the aquifer name and position in the vertical stack of 19 regional aquifers and confining units, as follows: Surficial aquifer, 1surf; Upper Chesapeake aquifer, 3upch; Lower Chesapeake aquifer, 5loch; Piney Point aquifer, 7pipt; Aquia aquifer, 9aqia; Monmouth - Mt. Laurel Aquifer, 11moml; Matawan aquifer, 13mtwn; Magothy Aquifer, 15mgty; Potomac-Patapsco aquifer, 17popt; Potomac-Patuxent aquifer, 19popx. The nine confining units are not represented in the model or predictions.</abstract>
      <purpose>These data were compiled to model and map manganese concentration in the NACP aquifer system. Manganese is an emerging health concern and a common nuisance contaminant in groundwater sources of drinking water. The NACP aquifer system is an important water supply source, providing drinking water in a densely populated region along the eastern coast of the United States.  Manganese is the most frequently occurring geogenic contaminant in the aquifer system. Models and maps of manganese concentrations in the NACP aquifer system can be used to identify areas where concentrations may be greatest and where future monitoring may be prioritized.</purpose>
    </descript>
    <timeperd>
      <timeinfo>
        <sngdate>
          <caldate>2021</caldate>
        </sngdate>
      </timeinfo>
      <current>publication date</current>
    </timeperd>
    <status>
      <progress>Complete</progress>
      <update>None planned</update>
    </status>
    <spdom>
      <bounding>
        <westbc>-78.2616</westbc>
        <eastbc>-71.1163</eastbc>
        <northbc>41.3556</northbc>
        <southbc>34.5269</southbc>
      </bounding>
    </spdom>
    <keywords>
      <theme>
        <themekt>USGS Thesaurus</themekt>
        <themekey>groundwater quality</themekey>
      </theme>
      <theme>
        <themekt>None</themekt>
        <themekey>manganese</themekey>
        <themekey>regional groundwater quality</themekey>
        <themekey>machine learning</themekey>
        <themekey>XGboost</themekey>
        <themekey>class imbalance</themekey>
        <themekey>NAWQA</themekey>
      </theme>
      <theme>
        <themekt>USGS Metadata Identifier</themekt>
        <themekey>USGS:6017f3c0d34edf5c66ef8bfe</themekey>
      </theme>
      <place>
        <placekt>Getty Thesaurus of Geographic Names</placekt>
        <placekey>New York</placekey>
        <placekey>New Jersey</placekey>
        <placekey>Delaware</placekey>
        <placekey>Maryland</placekey>
        <placekey>Virginia</placekey>
        <placekey>North Carolina</placekey>
        <placekey>Delmarva Peninsula</placekey>
        <placekey>Long Island</placekey>
      </place>
      <place>
        <placekt>None</placekt>
        <placekey>Northern Atlantic Coastal Plain aquifer system</placekey>
        <placekey>Surficial aquifer</placekey>
        <placekey>Upper Chesapeake aquifer</placekey>
        <placekey>Lower Chesapeake aquifer</placekey>
        <placekey>Piney Point aquifer</placekey>
        <placekey>Aquia aquifer</placekey>
        <placekey>Monmouth - Mt. Laurel aquifer</placekey>
        <placekey>Matawan aquifer</placekey>
        <placekey>Magothy aquifer</placekey>
        <placekey>Potomac - Patapsco aquifer</placekey>
        <placekey>Potomac - Patuxent aquifer</placekey>
      </place>
    </keywords>
    <accconst>none</accconst>
    <useconst>none</useconst>
    <ptcontac>
      <cntinfo>
        <cntperp>
          <cntper>Leslie A Desimone</cntper>
          <cntorg>New England Water Science Center</cntorg>
        </cntperp>
        <cntpos>Hydrologist</cntpos>
        <cntaddr>
          <addrtype>mailing and physical</addrtype>
          <address>10 Bearfoot Road</address>
          <city>Northborough</city>
          <state>MA</state>
          <postal>01532</postal>
          <country>US</country>
        </cntaddr>
        <cntvoice>508-490-5023</cntvoice>
        <cntemail>ldesimon@usgs.gov</cntemail>
      </cntinfo>
    </ptcontac>
    <datacred>Funding provided by the National Water-Quality Assessment (NAWQA) project</datacred>
  </idinfo>
  <dataqual>
    <attracc>
      <attraccr>No formal attribute accuracy tests were conducted</attraccr>
    </attracc>
    <logic>No formal logical accuracy tests were conducted</logic>
    <complete>Data set is considered complete for the information presented, as described in the abstract. Users are advised to read the rest of the metadata record carefully for additional details.</complete>
    <posacc>
      <horizpa>
        <horizpar>No formal positional accuracy tests were conducted</horizpar>
      </horizpa>
      <vertacc>
        <vertaccr>No formal positional accuracy tests were conducted</vertaccr>
      </vertacc>
    </posacc>
    <lineage>
      <srcinfo>
        <srccite>
          <citeinfo>
            <origin>See separate file, "explanatory_variables.csv", for a list of publications referenced in this data release</origin>
            <pubdate>2021</pubdate>
            <title>NACP_Mn_explanatory_variables.csv</title>
            <geoform>Various</geoform>
            <pubinfo>
              <pubplace>Various</pubplace>
              <publish>Various</publish>
            </pubinfo>
            <onlink>Various</onlink>
          </citeinfo>
        </srccite>
        <typesrc>Various</typesrc>
        <srctime>
          <timeinfo>
            <rngdates>
              <begdate>20030101</begdate>
              <enddate>20201231</enddate>
            </rngdates>
          </timeinfo>
          <srccurr>Source input data were current as described by their publication or access dates</srccurr>
        </srctime>
        <srccitea>Various</srccitea>
        <srccontr>Source data used to create explanatory variables</srccontr>
      </srcinfo>
      <procstep>
        <procdesc>A 1-square kilometer grid for the study area was created from the national-scale 1-square kilometer grid (Clark and others, 2018) for use in prediction and in mapping study results, as described in DeSimone and Pope (2020). This study-area raster is available in this data release in the file "NACP_Mn_prediction_grid.zip".

References:
Clark, B.R., Barlow, P.M., Peterson, S.M., Hughes, J.D., Reeves, H.W., and Viger, R.J., 2018, National-scale grid to support regional groundwater availability studies and a national hydrogeologic database: U.S. Geological Survey data release, https://doi.org/10.5066/F7P84B24

DeSimone, L.A., and Pope, J.P., 2020, Data used to model and map pH and redox conditions in the Northern Atlantic Coastal Plain aquifer system, eastern USA: U.S. Geological Survey data release, https://doi.org/10.5066/P94DYERF</procdesc>
        <procdate>2018</procdate>
      </procstep>
      <procstep>
        <procdesc>Manganese concentrations from well water were compiled from the U.S. Geological Survey (USGS) National Water Information System (NWIS) database, the U.S. Environmental Protection Agency Safe Drinking Water Information System (SDWIS) database, several state agencies, and the Suffolk County Water Authority (SCWA). Data from NWIS (76% of data), SDWIS (13%), and State agencies (&lt;1%) were compiled from these three sources as part of a national data aggregation by the USGS, in support of multiple water-quality investigations of the National Water Quality Assessment Project (Erickson and others, 2019). Data from the national aggregation were from samples collected between 1988 and 2018 for heterogeneous purposes from various well types. Data from the national aggregation were augmented on Long Island, NY, by data from the Suffolk County Water Authority (SCWA; 11% of the entire well data set). SCWA data were primarily from public supply wells and were all from samples collected in 2002. 

Reference:
Erickson, M. L., Yager, R.M., Kauffman, L.J., and Wilson, J.T., 2019, Drinking water quality in the glacial aquifer system northern USA: Science of the Total Environment, v. 694, https://doi.org/10.1016/j.scitotenv.2019.133735</procdesc>
        <procdate>2020</procdate>
      </procstep>
      <procstep>
        <procdesc>Explanatory variable data were obtained from various sources as described in the separate file "NACP_Mn_explanatory_variables.csv."</procdesc>
        <procdate>2020</procdate>
      </procstep>
      <procstep>
        <procdesc>Wells were associated with aquifers using well screen elevations and ancillary information on aquifer from the source database. Aquifer was not used as an explanatory variable in the model but was used to attribute other explanatory variables that varied by aquifer to wells and to categorize water-quality data from individual aquifers in the aquifer system. A python script was used to determine the intersections of well screen elevations and the top and bottom elevations of the 19 hydrogeologic units (aquifers and confining units). Hydrogeologic unit elevations were as represented in the regional groundwater flow model (MODFLOW) of Masterson and others (2016), modified from Pope and others (2016). Well screen top and bottom elevations were calculated from well minimum and maximum open interval depths in the data sources and land surface elevation from Pope and others (2016). Any missing well screen bottom elevations were set to well depth, if available. Any missing screen top elevation was set to land surface. The python script output and ancillary information was manually reviewed to determine the final aquifer designation. Wells with screened intervals that fell within the top and bottom elevation of an aquifer hydrogeologic units were designated as in that aquifer. As expected, not all regional-scale data on unit occurrence and elevations and the local-scale data for wells agreed. Wells with screens that intersected an aquifer unit but also extended into an adjacent confining unit were designated with that aquifer, especially if ancillary information on the aquifer from the source data base agreed. Wells were omitted for several reasons: if both well depth and depth to maximum open interval were missing; if well screens were entirely within confining units and there was no ancillary aquifer information; if well screens spanned more than one aquifer separated by a confining unit; or if wells were missing a screen top elevation and the aquifer at the bottom of the wells screen was not the aquifer indicated by the ancillary information from the source database.

References:
Masterson, J.P., Pope, J.P., Fienen, M.N., Monti, Jack Jr., Nardi, M.R., and Finkelstein, J.S., 2016, Documentation of a groundwater flow model developed to assess groundwater availability in the Northern Atlantic Coastal Plain aquifer system from Long Island, New York, to North Carolina (ver. 1.1, December 2016): U.S. Geological Survey Scientific Investigations Report 2016–5076, 70 p., http://dx.doi.org/10.3133/sir20165076.

Pope, J.P., Andreasen, D.C., Mcfarland, E.R., and Watt, M.K., 2016, Digital elevations and extents of regional hydrogeologic units in the Northern Atlantic Coastal Plain aquifer system from Long Island, New York, to North Carolina: U.S. Geological Survey Data Series 996, 28 p., http://dx.doi.org/10.3133/ds996.</procdesc>
        <procdate>2020</procdate>
      </procstep>
      <procstep>
        <procdesc>Distance from the Fall Zone (dist_fallzone) was calculated using the ArcMap (v. 10.6.1) “Near” Analysis tool. This distance represents the shortest distance to the western boundary of the study area (the Fall Zone) or, on Long Island, the distance to a line representing the maximum water table of Como and others (2015).

Reference:
Como, M.D., Noll, M.L., Finkelstein, J.S., Monti, J., Jr., and Busciolano, R., 2015, Water-table and potentiometric-surface altitudes in the Upper Glacial, Magothy, and Lloyd aquifers of Long Island, New York, April–May 2013: U.S. Geological Survey Scientific Investigations Map 3326, http://dx.doi.org/10.3133/sim3326.</procdesc>
        <procdate>2020</procdate>
      </procstep>
      <procstep>
        <procdesc>For the wetland variable (well_all), separate data sets for individual states in the study area (NY, NJ, DE, NC, MD, PA, and VA) were clipped to the study and combined using data analysis tools of ArcMap (v. 10.6.1). Areas of the following wetland classes of U.S. Fish and Wildlife Service (2019) were combined into a single spatial variable to represent wetland area: estuarine and marine wetland, freshwater and emergent wetland, freshwater forested/shrub, freshwater pond, riverine, lake, and other. 

Reference:
U.S. Fish and Wildlife Service. 2019. National Wetlands Inventory website. U.S. Department of the Interior, Fish and Wildlife Service, Washington, D.C. http://www.fws.gov/wetlands/, accessed March 14, 2020</procdesc>
        <procdate>2020</procdate>
      </procstep>
      <procstep>
        <procdesc>Explanatory variables representing land-surface characteristics (bfi, DSD1, DSD5, DSD9, evapot, fert_N, LP1, LP5, LP9, ls_elev, recharge, soil_clay_pct, soil_sand_pct, soil_silt_pct, wet_all) were attributed to wells and to prediction grid points using either a point extraction or an aggregation of values within 500-meter circular buffer areas surrounding the well or grid point locations. Point extractions were done in ESRI’s ArcMap software (v. 10.6.1) using the “Identity” Analysis tool for vector data layers or the “Extract Multi Values to Points” Spatial Analyst tool for raster data. Aggregation of values within buffer areas was done for raster data only. A python script subdivided source-data raster cells into smaller areas and then aggregated values within the buffer areas (Clark, B.R., Knierim, K.J., and Duncan, L., 2019, zonepy, version 0.0, https://github.com/brclark-usgs/zonepy). Area-weighted means (numeric or percentage values) were then calculated for the buffer areas.

Reference:
Clark, B.R., Knierim, K.J., and Duncan, L., 2019, zonepy, version 0.0, https://github.com/brclark-usgs/zonepy</procdesc>
        <procdate>2020</procdate>
      </procstep>
      <procstep>
        <procdesc>Variables from the groundwater flow model and groundwater residence times (age) (age_10_pctl, age_50th_pctl, age_90th_pctl, aq_hk, hk_thk, gw_flux, gw_head) were attributed to wells by setting them equal to the values of the 1-square-mile (1.6 square-kilometer) MODFLOW groundwater flow model grid cell in which the well was located (Masterson and others (2016) and for the model layer corresponding to the aquifer designation of the well.  Flow model and residence time variables were attributed to prediction grid points using ArcGIS to intersect the 1-square-mile model grid cell with the prediction grid and then calculating in R the area-weighted averages of MODFLOW variables within prediction grid cells, for each aquifer.

Reference:
Masterson, J.P., Pope, J.P., Fienen, M.N., Monti, Jack Jr., Nardi, M.R., and Finkelstein, J.S., 2016, Documentation of a groundwater flow model developed to assess groundwater availability in the Northern Atlantic Coastal Plain aquifer system from Long Island, New York, to North Carolina (ver. 1.1, December 2016): U.S. Geological Survey Scientific Investigations Report 2016–5076, 70 p., http://dx.doi.org/10.3133/sir20165076.</procdesc>
        <procdate>2020</procdate>
      </procstep>
      <procstep>
        <procdesc>The variables of predicted pH (pred_ph) and the predicted probability of dissolved oxygen less than 2 milligrams per liter (pred_do2) were attributed to wells by setting them equal to the values of the 1-square-kilometer prediction grid and aquifer in which the well was located. The same prediction grid used in the present study was used to create the pred_ph and pred_do2 variables; thus, these variables were assigned to prediction grid points using their source information (DeSimone and Pope, 2020) 

Reference:
DeSimone, L.A., Pope, J.P., 2020, Data used to model and map pH and redox conditions in the Northern Atlantic Coastal Plain aquifer system, eastern USA. U.S. Geological Survey data release. https://doi.org/10.5066/P94DYERF.</procdesc>
        <procdate>2020</procdate>
      </procstep>
      <procstep>
        <procdesc>Separate data files of explanatory variables were created for each of the 10 regional aquifers in the aquifer system. Variables representing surficial and other characteristics that did not vary by aquifer were the same in all 10 files. Variables that varied by aquifer, including variables based on the groundwater flow model, groundwater residence time, and predicted pH and dissolved oxygen probability, were set to the values appropriate for the aquifer layer. The variables of well depth and screen bottom elevation for each aquifer were set equal to the values corresponding to the vertical midpoint of each aquifer at each grid cell. The predevelopment water table, as simulated by the groundwater flow model, was used as the top of the surficial aquifer for these calculations.</procdesc>
        <procdate>2020</procdate>
      </procstep>
      <procstep>
        <procdesc>The XGboost method (Chen and Guestrin, 2016; Chen and others, 2020) was used to fit a machine learning model to the well data, as described in DeSimone and Ransom (2021). The resulting model is a classification model that predicts manganese concentration within four classes: less than 10 micrograms per liter (ug/L), 10 to 50 ug/L, 50 to 300 ug/L, and greater than 300 ug/L. The model was used to make predictions of manganese concentration class for the wells, as part of model development, to determine model fit. The model was then used to make predictions across the study area at the prediction grid points for each of the 10 regional aquifers. 

References:
Chen, T., and Guestrin, C., 2016. XGBoost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/2939672.2939785.

Chen, T., He., T., Benesty, M., Guestrin, C., 2020. Package ‘xgboost’. Accessed October 16, 2020 at https://cran.r-project.org/web/packages/xgboost/xgboost.pdf.

DeSimone, L.A., and Ransom, K.M., 2021, Manganese in the Northern Atlantic Coastal Plain aquifer system, eastern USA—Modeling regional occurrence with pH, redox, and machine learning, in prep [will update before publishing this data release].</procdesc>
        <procdate>2020</procdate>
      </procstep>
      <procstep>
        <procdesc>Tif-format raster files of predictions were created in R by substituting the predicted value for the grid cell identification variable (gridcode) in the study-area raster of the national grid raster.</procdesc>
        <procdate>2020</procdate>
      </procstep>
    </lineage>
  </dataqual>
  <spdoinfo>
    <indspref>See the metadata describing individual data tables and raster files for spatial data organization</indspref>
    <direct>Raster</direct>
  </spdoinfo>
  <spref>
    <horizsys>
      <planar>
        <mapproj>
          <mapprojn>Albers Conical Equal Area</mapprojn>
          <albers>
            <stdparll>29.5</stdparll>
            <stdparll>45.5</stdparll>
            <longcm>-96.0</longcm>
            <latprjo>23.0</latprjo>
            <feast>0.0</feast>
            <fnorth>0.0</fnorth>
          </albers>
        </mapproj>
        <planci>
          <plance>coordinate pair</plance>
          <coordrep>
            <absres>1000.0</absres>
            <ordres>1000.0</ordres>
          </coordrep>
          <plandu>METERS</plandu>
        </planci>
      </planar>
      <geodetic>
        <horizdn>WGS84</horizdn>
        <ellips>WGS84</ellips>
        <semiaxis>6378137.0</semiaxis>
        <denflat>298.257223563</denflat>
      </geodetic>
    </horizsys>
  </spref>
  <eainfo>
    <detailed>
      <enttyp>
        <enttypl>NACP_Mn_explanatory_variables.csv</enttypl>
        <enttypd>Tabular, comma-delimited file defining the explanatory variables (predictors) listed in the well and prediction input files and used in the model</enttypd>
        <enttypds>Producer defined</enttypds>
      </enttyp>
      <attr>
        <attrlabl>Variable_name</attrlabl>
        <attrdef>Name of the explanatory variable, corresponding to column headings in well and prediction input files and to variable names in models</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <udom>A short name for the explanatory variable</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Description</attrlabl>
        <attrdef>A short description of the explanatory variable</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <udom>Text defining the explanatory variable</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Varies_by_aquifer</attrlabl>
        <attrdef>Describes whether predictor is different for each of the 10 regional aquifers or is the same for all aquifers</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>yes</edomv>
            <edomvd>The predictor is different for each of the 10 regional aquifers</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>no</edomv>
            <edomvd>The predictor is the same for all of the regional aquifers</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Units</attrlabl>
        <attrdef>Lists the units of the variable</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <udom>Text describing variable units</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Time_period</attrlabl>
        <attrdef>Describes the time period associated with the variable</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <udom>The time period is listed for variables that are associated with specific time periods and "NA" is listed for variables not associated with a specific time period</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Method_of_attribution</attrlabl>
        <attrdef>Method of attributing variable to well or prediction grid point</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <udom>See the "Data Quality" section of this data release and the supplementary information of DeSimone and others (2020) for explanation of attribution processes

Reference:
DeSimone, L.A., Pope, J.P., Ransom, K.M. 2020. Machine learning models of pH and dissolved oxygen in the Northern Atlantic Coastal Plain aquifer system, eastern USA. J. Hydrol. Reg. Studies. 30, 100697. https://doi.org/10.1016/j.ejrh.2020.100697</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Source_short_citations</attrlabl>
        <attrdef>Short citation or citations for the data source, separated by semi-colons</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <udom>Author(s) and publication date for the data source(s)</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Source_full_citation1</attrlabl>
        <attrdef>Complete reference for the first (or only) data source</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <udom>Author(s), publication date, title, publisher, and online link for the data source</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Source_full_citation2</attrlabl>
        <attrdef>Complete reference for the second data source, if there is one</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <udom>Author(s), publication date, title, publisher, and online link for the data source if there is a second data source; otherwise "NA"</udom>
        </attrdomv>
      </attr>
    </detailed>
    <detailed>
      <enttyp>
        <enttypl>NACP_Mn_well_data.csv</enttypl>
        <enttypd>Tabular, comma-delimited file containing well data used in model development, including manganese concentrations and concentration classes, aquifer, explanatory variables, whether the well was used to train or test the model, and model predictions. Model predictions consist of the predicted probability of membership in each of the four concentration classes and the final predicted class for each well. Data columns for the explanatory variables are described in a separate file, "NACP_Mn_explanatory_variables.csv"</enttypd>
        <enttypds>Producer defined</enttypds>
      </enttyp>
      <attr>
        <attrlabl>well_seqno</attrlabl>
        <attrdef>A sequential numeric identifier the well, that also indicates sorting for the well data file when used as input to the model</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>1</rdommin>
            <rdommax>4492</rdommax>
            <attrunit>unitless</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>STAID</attrlabl>
        <attrdef>Well identifier</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <udom>A numeric or alphanumeric identifier that uniquely identifies the well.</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>data_source</attrlabl>
        <attrdef>Source of the well data</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NWIS</edomv>
            <edomvd>U.S. Geological Survey National Water Information System database</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>SDWIS</edomv>
            <edomvd>U.S. Environmental Protection Agency Safe Drinking Water Information System database</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>GWAM</edomv>
            <edomvd>State agency databases</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>SCWA</edomv>
            <edomvd>Suffolk County Water Authority</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>aquifer</attrlabl>
        <attrdef>Name of the regional aquifer from which the well withdrawals water</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>1_Surficial</edomv>
            <edomvd>Surficial aquifer, layer 1 from the surface in the vertical stack of 19 regional aquifers and intervening confining units</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>3_UpperChesapeake</edomv>
            <edomvd>Upper Chesapeake aquifer, layer 3 from the surface in the vertical stack of 19 regional aquifers and intervening confining units</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>5_LowerChesapeake</edomv>
            <edomvd>Lower Chesapeake aquifer, layer 5 from the surface in the vertical stack of 19 regional aquifers and intervening confining units</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>7_PineyPoint</edomv>
            <edomvd>Piney Point aquifer, layer 7 from the surface in the vertical stack of 19 regional aquifers and intervening confining units</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>9_Aquia</edomv>
            <edomvd>Aqia aquifer, layer 9 from the surface in the vertical stack of 19 regional aquifers and intervening confining units</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>11_MonmouthMtLaurel</edomv>
            <edomvd>Monmouth - Mt. Laurel aquifer, layer 11 from the surface in the vertical stack of 19 regional aquifers and intervening confining units</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>13_Matawan</edomv>
            <edomvd>Matawan aquifer, layer 13 from the surface in the vertical stack of 19 regional aquifers and intervening confining units</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>15_Magothy</edomv>
            <edomvd>Magothy aquifer, layer 15 from the surface in the vertical stack of 19 regional aquifers and intervening confining units</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>17_PotomacPatapsco</edomv>
            <edomvd>Potomac-Patapsco aquifer, layer 17 from the surface in the vertical stack of 19 regional aquifers and intervening confining units</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>19_PotomacPatuxent</edomv>
            <edomvd>Potomac-Patuxent aquifer, layer 19 from the surface in the vertical stack of 19 regional aquifers and intervening confining units</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No aquifer is associated with the well</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>mn_rmk</attrlabl>
        <attrdef>Remark value for manganese concentration</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>&lt;</edomv>
            <edomvd>Less than</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No remark</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>mn_val</attrlabl>
        <attrdef>Value for manganese concentration</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>0.001</rdommin>
            <rdommax>75100</rdommax>
            <attrunit>Micrograms per liter</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>predprob_class1</attrlabl>
        <attrdef>Model-predicted probability that manganese concentrations at the well are within the range of class 1 (less than 10 micrograms per liter)</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>0</rdommin>
            <rdommax>1</rdommax>
            <attrunit>unitless</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>predprob_class2</attrlabl>
        <attrdef>Model-predicted probability that manganese concentrations at the well are within the range of class 2 (10 to 50 micrograms per liter)</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>0</rdommin>
            <rdommax>1</rdommax>
            <attrunit>unitless</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>predprob_class3</attrlabl>
        <attrdef>Model-predicted probability that manganese concentrations at the well are within the range of class 3 (50 to 300 micrograms per liter)</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>0</rdommin>
            <rdommax>1</rdommax>
            <attrunit>unitless</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>predprob_class4</attrlabl>
        <attrdef>Model-predicted probability that manganese concentrations at the well are within the range of class 4 (greater than 300 micrograms per liter)</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>0</rdommin>
            <rdommax>1</rdommax>
            <attrunit>unitless</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>predclass</attrlabl>
        <attrdef>Model-predicted concentration class for manganese concentrations at the well. This is the class that had the highest predicted probability of the four classes.</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>1</edomv>
            <edomvd>class 1: less than 10 micrograms per liter</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>2</edomv>
            <edomvd>class 2: 10 to 50 micrograms per liter</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>3</edomv>
            <edomvd>class 3: 50 to 300 micrograms per liter</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>4</edomv>
            <edomvd>class 4: greater than 300 micrograms per liter</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Attributes consisting of the explanatory variables as defined in a separate file</attrlabl>
        <attrdef>These attributes are the 27 explanatory variables (predictors) used to develop the model and make predictions. They are defined and described in a separate file, "NACP_Mn_explanatory_variables.csv" (columns "age_10th_pctl" through "soil_sand_pct").</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <udom>See the separate file, "NACP_Mn_explanatory_variables.csv"</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>data_type</attrlabl>
        <attrdef>Field that indicates whether the well was used for training or testing the model.</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>Train</edomv>
            <edomvd>The well was used to train the model.</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>Test</edomv>
            <edomvd>The well was used to test the model.</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
      </attr>
    </detailed>
    <detailed>
      <enttyp>
        <enttypl>NACP_Mn_prediction_input_aquifers.zip</enttypl>
        <enttypd>A zip file containing 10 tabular, comma-delimited files of input data used to predict manganese concentration class for the 10 regional aquifers in the study area. Each file contains a grid cell identifier attribute (gridcode) and the explanatory variable data used to predict manganese concentration class probability at the central nodal points of a 1-square-kilometer grid across the study area. The filenames consist of "nacp_mn_predgridin_" followed by a code for the aquifer. The codes for the aquifers are listed in the Abstract. Column headings are described in a separate file, "NACP_Mn_explanatory_variables.csv"</enttypd>
        <enttypds>Producer defined</enttypds>
      </enttyp>
      <attr>
        <attrlabl>gridcode</attrlabl>
        <attrdef>Number identifying the grid cell from the national grid source data layer, described in the first process step of this metadata.</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <udom>This attribute corresponds to the cell value in the national 1-km raster, described in the first process step of this metadata.</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Attributes consisting of the explanatory variables as defined in a separate file</attrlabl>
        <attrdef>These attributes are the 27 explanatory variables used to develop the model and make predictions. They are defined and described in a separate file, "NACP_Mn_explanatory_variables.csv".</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <udom>See the separate file, "NACP_Mn_explanatory_variables.csv"</udom>
        </attrdomv>
      </attr>
    </detailed>
    <detailed>
      <enttyp>
        <enttypl>NACP_Mn_prediction_output_aquifers_data.zip</enttypl>
        <enttypd>A zip file containing 10 tabular, comma-delimited files and of the predicted probability  in the 10 regional aquifers in the study area. Each csv file contains a grid cell identifier (gridcode) and the 6 model output values at the central nodal points of a 1-square-kilometer grid across the study area. The raster tif files correspond spatially to the prediction grid and each tif file has predicted pH as its value. The filenames consist of "predout_ph_" (for csv files) or "predras_ph_" (for tif files) followed by a code for the aquifer. The codes for the aquifers are listed in the Abstract.</enttypd>
        <enttypds>Producer defined</enttypds>
      </enttyp>
      <attr>
        <attrlabl>gridcode</attrlabl>
        <attrdef>Sequential number identifying the grid cell of the prediction grid for which the prediction is made</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <udom>This attribute corresponds to the cell value in the national raster.</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>predprob_class1_aqX, where "X" is a number designating the aquifer for which the prediction is made</attrlabl>
        <attrdef>Model-predicted probability that manganese concentrations at the grid cell point are within the range of class 1 (less than 10 micrograms per liter)</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>0</rdommin>
            <rdommax>1</rdommax>
            <attrunit>unitless</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>predprob_class2_aqX, where "X" is a number designating the aquifer for which the prediction is made</attrlabl>
        <attrdef>Model-predicted probability that manganese concentrations at the grid cell point are within the range of class 2 (10 to 50 micrograms per liter)</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>0</rdommin>
            <rdommax>1</rdommax>
            <attrunit>unitless</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>predprob_class3_aqX, where "X" is a number designating the aquifer for which the prediction is made</attrlabl>
        <attrdef>Model-predicted probability that manganese concentrations at the grid cell point are within the range of class 3 (50 to 300 micrograms per liter)</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>0</rdommin>
            <rdommax>1</rdommax>
            <attrunit>unitless</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>predprob_class4_aqX, where "X" is a number designating the aquifer for which the prediction is made</attrlabl>
        <attrdef>Model-predicted probability that manganese concentrations at the grid cell point are within the range of class 4 (greater than 300 micrograms per liter)</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>0</rdommin>
            <rdommax>1</rdommax>
            <attrunit>unitless</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>predclass_aqX, where "X" is a number designating the aquifer for which the prediction is made</attrlabl>
        <attrdef>Model-predicted concentration class for manganese concentrations at the grid cell point. This is the class that had the largest predicted probability of the four classes.</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>1</edomv>
            <edomvd>class 1: less than 10 micrograms per liter</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>2</edomv>
            <edomvd>class 2: 10 to 50 micrograms per liter</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>3</edomv>
            <edomvd>class 3: 50 to 300 micrograms per liter</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>4</edomv>
            <edomvd>class 4: greater than 300 micrograms per liter</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>predprob_predclass_aqX, where "X" is a number designating the aquifer for which the prediction is made</attrlabl>
        <attrdef>Model-predicted probability of the predicted class at the grid cell point. This is the predicted probability for the class that had the largest predicted probability of the four classes.</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>0</rdommin>
            <rdommax>1</rdommax>
            <attrunit>unitless</attrunit>
          </rdom>
        </attrdomv>
      </attr>
    </detailed>
    <detailed>
      <enttyp>
        <enttypl>NACP_Mn_prediction_output_aquifers_rasters.zip</enttypl>
        <enttypd>A zip file containing tif-format raster files of model predictions for the 10 regional aquifers in the study area. There are 10 files, one for each aquifer, for each of 6 model output values at the central nodal points of a 1-square-kilometer grid across the study area. The following terms in the filenames are used to designate the type of model output in the file: predclass, predprob_class1, predprob_class2, predprob_class3, predprob_class4, and predprob_predclass. These model output values are defined as follows. Predclass is the model-predicted concentration class at the grid point. Predprob_class1, predprob_class2, predprob_class3, and predprob_class4 are the model-predicted probability of membership in concentration classes 1 through 4, respectively, at the grid point. Predprob_predclass is the model-predicted probability of the predicted class at the grid point. Each file name also has by a code for the aquifer, as listed in the Abstract.</enttypd>
        <enttypds>Producer defined</enttypds>
      </enttyp>
      <attr>
        <attrlabl>Raster values correspond to one of six types of model output values; there are no other attributes.</attrlabl>
        <attrdef>Raster values correspond to one of six types of model output values, indicated by the file name, as follows. Predclass is the model-predicted concentration class at the grid point. Predprob_class1, predprob_class2, predprob_class3, and predprob_class4 are the model-predicted probability of membership in concentration classes 1 through 4, respectively, at the grid point. Predprob_predclass is the model-predicted probability of the predicted class at the grid point.</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <udom>See the Entity Type and Attribute descriptions</udom>
        </attrdomv>
      </attr>
    </detailed>
    <detailed>
      <enttyp>
        <enttypl>NACP_Mn_prediction_grid.zip</enttypl>
        <enttypd>Raster tif file of 1-square-kilometer grid used for prediction across the study area and a file, "gridcode_pred.csv", containing the gridcodes of cells for which predictions were made within the study area.</enttypd>
        <enttypds>Clark and others (2018), clipped to the study area

Reference:

Clark, B.R., Barlow, P.M., Peterson, S.M., Hughes, J.D., Reeves, H.W., and Viger, R.J., 2018, National-scale grid to support regional groundwater availability studies and a national hydrogeologic database: U.S. Geological Survey data release, https://doi.org/10.5066/F7P84B24</enttypds>
      </enttyp>
      <attr>
        <attrlabl>gridcode (the raster value in tif files, not a named attribute)</attrlabl>
        <attrdef>Sequential number identifying the grid cell of the prediction grid (Clark and others, 2018) for which the prediction is made</attrdef>
        <attrdefs>Clark and others (2018)</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>8111983</rdommin>
            <rdommax>12070911</rdommax>
          </rdom>
        </attrdomv>
      </attr>
    </detailed>
    <detailed>
      <enttyp>
        <enttypl>NACP_Mn_model.zip</enttypl>
        <enttypd>A zip file containing the model object file, in R data format, and an R script used to read and run the model to make predictions. The zip file also includes a text README file providing information on software and systems used to develop and run the model.</enttypd>
        <enttypds>Producer defined</enttypds>
      </enttyp>
      <attr>
        <attrlabl>Files in the zip file are described in a text README file; there are no data table attributes.</attrlabl>
        <attrdef>See the text README file within the zip file for detailed description of zip file contents; there are no data table attributes.</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <udom>See the Entity Type and Attribute descriptions</udom>
        </attrdomv>
      </attr>
    </detailed>
  </eainfo>
  <distinfo>
    <distrib>
      <cntinfo>
        <cntperp>
          <cntper>GS ScienceBase</cntper>
          <cntorg>U.S. Geological Survey</cntorg>
        </cntperp>
        <cntaddr>
          <addrtype>mailing and physical</addrtype>
          <address>Denver Federal Center, Building 810, Mail Stop 302</address>
          <city>Denver</city>
          <state>CO</state>
          <postal>80225</postal>
          <country>United States</country>
        </cntaddr>
        <cntvoice>1-888-275-8747</cntvoice>
        <cntemail>sciencebase@usgs.gov</cntemail>
      </cntinfo>
    </distrib>
    <distliab>Unless otherwise stated, all data, metadata and related materials are considered to satisfy the quality standards relative to the purpose for which the data were collected. Although these data have been processed successfully on a computer system at the USGS, no warranty expressed or implied is made regarding the display or utility of the data for other purposes, nor on all computer systems, nor shall the act of distribution constitute any such warranty. The USGS or the U.S. Government shall not be held liable for improper or incorrect use of the data described and/or contained herein. Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.</distliab>
  </distinfo>
  <metainfo>
    <metd>20211018</metd>
    <metc>
      <cntinfo>
        <cntperp>
          <cntper>Leslie A Desimone</cntper>
          <cntorg>New England Water Science Center</cntorg>
        </cntperp>
        <cntpos>Hydrologist</cntpos>
        <cntaddr>
          <addrtype>mailing and physical</addrtype>
          <address>10 Bearfoot Road</address>
          <city>Northborough</city>
          <state>MA</state>
          <postal>01532</postal>
          <country>US</country>
        </cntaddr>
        <cntvoice>508-490-5023</cntvoice>
        <cntemail>ldesimon@usgs.gov</cntemail>
      </cntinfo>
    </metc>
    <metstdn>Content Standard for Digital Geospatial Metadata</metstdn>
    <metstdv>FGDC-STD-001-1998</metstdv>
  </metainfo>
</metadata>
