<?xml version='1.0' encoding='UTF-8'?>
<metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <idinfo>
    <citation>
      <citeinfo>
        <origin>Catherine S. Jarnevich</origin>
        <origin>Peder Engelstad</origin>
        <origin>Demetra A. Williams</origin>
        <origin>Keana S. Shadwell</origin>
        <origin>Cameron J. Reimer</origin>
        <origin>Grace C. Henderson</origin>
        <origin>Linnea Fraser</origin>
        <origin>Shelby LeClare</origin>
        <origin>Rich Inman</origin>
        <origin>Ian Pfingsten</origin>
        <origin>Wesley Daniel</origin>
        <pubdate>20260123</pubdate>
        <title>Aqua INHABIT species potential distribution across the contiguous United States</title>
        <geoform>tabular and raster digital data</geoform>
        <onlink>https://doi.org/10.5066/P13JMOQW</onlink>
      </citeinfo>
    </citation>
    <descript>
      <abstract>This is a dataset containing the potential distribution of 103 invasive species found in or adjacent to freshwater environments. We developed habitat suitability models for invasive freshwater species selected by resource management agencies and other managers. We adapted the modeling workflow described in Jarnevich et al. (2024, https://doi.org/10.3897/neobiota.96.134842). We developed a national library of environmental variables known to physiologically limit freshwater species distributions (Henderson et al. 2025, https://doi.org/10.5066/P14JDTTJ) and relied on human input based on natural history knowledge to narrow the variable set for each species before developing habitat suitability models. We developed models using five algorithms (boosted regression tree, generalized linear model, multi-variate adaptive regression spline, maxent, and random forest) with VisTrails: Software for Assisted Habitat Modeling (SAHM 2.2.2, Morisette et al., 2013: https://doi.org/10.1111/j.1600-0587.2012.07815.x). For each species, we generated models for waterbody and stream environments where there were enough data available in each system. For several predictors we had future scenarios available, including up to five alternatives. We combined stream and waterbody models into a single map representing freshwater environments using a weighted ensemble across algorithms, including current and alternative future conditions. We also calculated subwatersheds (Hydrologic Unit Code [HUC] 12) summaries of information related to pathways of spread, finally integrating this information with current and alternative future habitat suitability to produce a risk score for each HUC12 east of the Mississippi River. This data bundle contains a single file of tabular summaries by management unit (including each species/ensemble type/abundance level combination), the merged data sets used to create models, tabular outputs including response curve data, variable importance information, model assessment metrics, a single file of pathways and risk scores, a spatial vector layer of the subwatersheds to which pathways and risk are summarized, a species metadata file, and R scripts to produce the model inputs and final products. 

The bundle documentation files are:
1) 'AquaINHABIT_V1_metadata.xml' (this file) which contains the project-level metadata.
2) 'Species_model_information.csv' contains information on specific model changes of each species from tuning algorithm parameters to ensure model quality. 
3) 'Merged_dataset.csv' contains the merged data set used to create the models, including location and associated environmental data, for each species.
4) Variable_importance.csv is the tabular summaries indicating predictor importance for each of the models produced for each species.
5) Assessment_metrics.csv is the tabular summaries of assessment metrics for each model or ensemble for each species.
6) Risk_and_pathways_scores.csv contains summarized values for pathways, current suitability, future suitability, and a combined risk for each species by HUC12.
7) WBD_HUC12_EasternUS.gpkg is the spatial vector file of the HUC12 boundaries used for analyses and summarizations.
8) Rcode.zip is a zipped file of the R scripts used to pull and prep the data included in the merged dataset, code to make the derived raster outputs, and code to calculate the pathway and suitability summaries along with the final risk score.

There are also three child items. 
1) “response curves” child item contains files of XX_response_curves.csv, the tabular information needed to produce response curves for each predictor retained in each of the up to 10 models produced for each species, where XX represents the code for the species from 'Species_model_information.csv', all within the response curves child item.
2) "management summaries" child item contains files Management_summaries_XX.csv which are the tabular summaries by management area, where XX indicates the management area group. The management area group includes HUC12 and US Army Corps of Engineers (USACE) project areas, with 1 file for USACE and HUC12 broken out into HUC2 watersheds (01 to 18)
3) "rasters" child items contains raster files XX_YY.tif where XX is the code for the species from 'Species_model_information.csv' and YY is the raster type including seven rasters for each species: 

1) Current occurrence suitability - Continuous value ensemble (XX-ens-current-mean .tif)
2) Restricted current occurrence suitability - Continuous value ensemble with restricted environmental conditions* (XX-ens-current-mean-masked.tif)
3) Future occurrence suitability - Continuous value ensemble (XX-ens-future-mean.tif)
4) Maximum future occurrence suitability - Maximum continuous value ensemble from the five alternative scenarios (XX-ens-future-max-gcm.tif)
5) Minimum future occurrence suitability - Minimum continuous value ensemble from the five alternative scenarios (XX-ens-future-min-gcm.tif) 
6) Standard deviation of future occurrence suitability - Standard deviation of continuous value from each algorithm for the five alternative scenarios (XX-ens-future-stdev.tif)
7) Restricted count -  Count of ensembles from each alternative scenario with restricted environmental conditions* (XX-gcm-mask-count.tif)

*Restricted environmental conditions = only display areas where environmental characteristics are inside the range of the values used to develop the model. For example, a location with a minimum winter temperature of 12 C would be outside the range of -10 to 10 C used in model development.

These data will be integrated into the first version of AquaINHABIT, a web application displaying visual and statistical summaries of freshwater habitat suitability models for manager identified invasive species. These species include: Aldrovanda vesiculosa, Alisma plantago-aquatica, Alosa pseudoharengus, Alternanthera philoxeroides, Ambloplites rupestris, Ameiurus catus, Ameiurus melas, Ameiurus natalis, Arundo donax, Astronotus ocellatus, Azolla cristata, Azolla pinnata, Bithynia tentaculata, Butomus umbellatus, Bythotrephes longimanus, Cabomba caroliniana, Callitriche stagnalis, Canna glauca, Carassius auratus, Carassius gibelio, Channa argus, Chrosomus oreas, Cipangopaludina chinensis, Cipangopaludina japonica, Clarias batrachus, Colocasia esculenta, Corbicula fluminea, Crassula helmsii, Ctenopharyngodon idella, Cyperus blepharoleptos, Cyperus papyrus, Cyprinella lutrensis, Cyprinus carpio, Daphnia lumholtzi, Didymosphenia geminata, Dorosoma cepedianum, Dorosoma petenense, Dreissena bugensis, Dreissena polymorpha, Echinogammarus ischnus, Egeria densa, Egeria najas, Eichhornia azurea, Eichhornia crassipes, Eichhornia paniculata, Eubosmina coregoni , Fallopia japonica, Faxonius rusticus, Faxonius virilis, Gambusia affinis, Gambusia holbrooki, Glyceria maxima, Gymnocephalus cernua, Hemimysis anomala, Hottonia palustris, Hydrilla verticillata, Hydrilla verticillata peregrina, Hydrilla verticillata verticillata, Hydrocharis morsus-ranae, Hydrocotyle ranunculoides, Hygrophila polysperma, Hypophthalmichthys molitrix, Hypophthalmichthys nobilis, Ictalurus furcatus, Ipomoea aquatica, Iris pseudacorus, Lagarosiphon major, Landoltia punctata, Lepomis gulosus, Limnobium laevigatum, Limnobium spongia, Limnocharis flava, Limnophila indica, Limnophila sessiliflora, Ludwigia decurrens, Ludwigia grandiflora, Ludwigia hexapetala, Ludwigia octovalvis, Ludwigia peploides, Ludwigia peruviana, Lythrum hyssopifolia, Lythrum portula, Lythrum salicaria, Marsilea quadrifolia, Melaleuca quinquenervia, Melanoides tuberculata, Mentha aquatica, Micropterus henshalli, Misgurnus anguillicaudatus, Monochoria vaginalis, Monopterus albus, Murdannia keisak, Mylopharyngodon piceus, Myosotis scorpioides, Myriophyllum aquaticum, Myriophyllum heterophyllum, Myriophyllum sibiricum, Myriophyllum spicatum, Najas marina, Najas minor, Nasturtium microphyllum, Nasturtium officinale, Nelumbo lutea, Nelumbo nucifera, Neogobius melanostomus, Nitellopsis obtusa, Nymphaea alba, Nymphaea lotus, Nymphaea mexicana, Nymphoides cristata, Nymphoides grayana, Nymphoides indica, Nymphoides peltata, Oreochromis aureus, Oreochromis niloticus, Oryza sativa, Panicum repens, Persicaria hydropiper, Phalaris arundinacea, Phragmites australis australis, Pistia stratiotes, Pomacea canaliculata, Pomacea canaliculata, Pomacea maculata, Potamogeton crispus, Potamopyrgus antipodarum, Procambarus clarkii, Proterorhinus semilunaris, Pterygoplichthys anisitsi, Pterygoplichthys disjunctivus, Pterygoplichthys multiradiatus, Pterygoplichthys pardalis, Pylodictis olivaris, Rotala indica, Rotala rotundifolia, Sagittaria guayanensis, Sagittaria montevidensis, Salmo trutta, Salvinia auriculata, Salvinia minima, Salvinia molesta, Salvinia natans, Salvinia oblongifolia, Sander lucioperca, Scardinius erythrophthalmus, Sporobolus anglicus, Stratiotes aloides, Tamarix chinensis and ramosissima, Tamarix sp, Tinca tinca, Trapa bispinosa, Trapa natans, Typha domingensis, Utricularia inflata, and Veronica anagallis-aquatica.</abstract>
      <purpose>To provide information and documentation on the potential distribution of invasive aquatic species in the contiguous United States.</purpose>
    </descript>
    <timeperd>
      <timeinfo>
        <rngdates>
          <begdate>1980</begdate>
          <enddate>2024</enddate>
        </rngdates>
      </timeinfo>
      <current>date of source data</current>
    </timeperd>
    <status>
      <progress>Complete</progress>
      <update>Irregular</update>
    </status>
    <spdom>
      <descgeog>Contiguous United States</descgeog>
      <bounding>
        <westbc>-128.3863</westbc>
        <eastbc>-65.0916</eastbc>
        <northbc>51.2443</northbc>
        <southbc>23.0860</southbc>
      </bounding>
    </spdom>
    <keywords>
      <theme>
        <themekt>ISO 19115 Topic Category</themekt>
        <themekey>biota</themekey>
      </theme>
      <theme>
        <themekt>None</themekt>
        <themekey>Species Distribution Modeling</themekey>
        <themekey>VisTrails</themekey>
        <themekey>SAHM</themekey>
        <themekey>habitat suitability</themekey>
      </theme>
      <theme>
        <themekt>USGS Thesaurus</themekt>
        <themekey>invasive species</themekey>
        <themekey>biogeography</themekey>
        <themekey>modeling</themekey>
        <themekey>habitat suitability indices</themekey>
      </theme>
      <theme>
        <themekt>USGS Metadata Identifier</themekt>
        <themekey>USGS:685039bcd4be025b9e8c6368</themekey>
      </theme>
      <place>
        <placekt>None</placekt>
        <placekey>Contiguous United States</placekey>
      </place>
      <place>
        <placekt>Common geographic areas</placekt>
        <placekey>United States</placekey>
      </place>
    </keywords>
    <accconst>None.  Please see 'Distribution Info' for details.</accconst>
    <useconst>None.  Users are advised to read the dataset's metadata thoroughly to understand appropriate use and data limitations.</useconst>
    <ptcontac>
      <cntinfo>
        <cntperp>
          <cntper>Catherine S Jarnevich</cntper>
          <cntorg>U.S. Geological Survey, Rocky Mountain Region</cntorg>
        </cntperp>
        <cntpos>Ecologist</cntpos>
        <cntaddr>
          <addrtype>mailing</addrtype>
          <address>2150 Centre Avenue Bldg C</address>
          <city>Fort Collins</city>
          <state>CO</state>
          <postal>80526</postal>
        </cntaddr>
        <cntvoice>970-226-9439</cntvoice>
        <cntemail>jarnevichc@usgs.gov</cntemail>
      </cntinfo>
    </ptcontac>
    <datacred>Funding to support AquaINHABIT has come from USGS Northeast Climate Adaptation Science Center, U.S. Army Corps of Engineers, and U.S. Geological Survey Invasive Species Program. The U.S. Geological Survey Fort Collins Science Center participated in the project.</datacred>
    <native>Created using version 2.2.2 of the VisTrails/SAHM package (Morisette et al., 2013), and version 4.3.1 of R, including "taxize", "CoordinateCleaner", "enmSdmX" packages. 

Citations:
Scott Chamberlain, Eduard Szoecs, Zachary Foster, Zebulun Arendsee, Carl Boettiger, Karthik Ram, Ignasi Bartomeus, John Baumgartner, James O'Donnell, Jari Oksanen, Bastian Greshake Tzovaras, Philippe Marchand, Vinh Tran, Maëlle Salmon, Gaopeng Li, and Matthias Grenié. (2020) taxize: Taxonomic information from around the web. R package version 0.9.98. https://github.com/ropensci/taxize

Zizka A, Silvestro D, Andermann T, Azevedo J, Duarte Ritter C, Edler D, Farooq H, Herdean A, Ariza M, Scharn R, Svanteson S, Wengstrom N, Zizka V, Antonelli A (2019). “CoordinateCleaner: standardized cleaning of occurrence records from biological collection databases.” _Methods in Ecology and Evolution_, -7. doi:10.1111/2041-210X.13152 &lt;https://doi.org/10.1111/2041-210X.13152&gt;, R package version 3.0.1, &lt;https://github.com/ropensci/CoordinateCleaner&gt;.

Smith A, Murphy S, Henderson D, Erickson K (2023). “Including imprecisely georeferenced specimens improves accuracy of species distribution models and estimates of niche breadth.” _Global Ecology &amp; Biogeography_, *32*, -13. doi:10.1111/geb.13628 &lt;https://doi.org/10.1111/geb.13628&gt;.</native>
    <crossref>
      <citeinfo>
        <origin>Catherine S. Jarnevich</origin>
        <origin>Peder Engelstad</origin>
        <origin>Demetra Williams</origin>
        <origin>Keana Shadwell</origin>
        <origin>Cameron Reimer</origin>
        <origin>Grace Henderson</origin>
        <origin>Janet S. Prevey</origin>
        <origin>Ian S. Pearse</origin>
        <pubdate>20241125</pubdate>
        <title>Predicted occurrence and abundance habitat suitability of invasive plants in the contiguous United States: updates for the INHABIT web tool</title>
        <geoform>publication</geoform>
        <serinfo>
          <sername>NeoBiota</sername>
          <issue>vol. 96</issue>
        </serinfo>
        <pubinfo>
          <pubplace>n/a</pubplace>
          <publish>Pensoft Publishers</publish>
        </pubinfo>
        <othercit>ppg. 261-278</othercit>
        <onlink>https://doi.org/10.3897/neobiota.96.134842</onlink>
      </citeinfo>
    </crossref>
    <crossref>
      <citeinfo>
        <origin>Henderson, Grace</origin>
        <origin>Engelstad, Peder</origin>
        <origin>Reimer, Cameron J</origin>
        <origin>LeClare, Shelby K.</origin>
        <origin>Fraser, Linnea S.</origin>
        <origin>Williams, Demetra A.</origin>
        <origin>Shadwell, Keana S.</origin>
        <origin>Daniel, Wesley M.</origin>
        <origin>Pfingsten, Ian A.</origin>
        <origin>Jarnevich, Catherine S.</origin>
        <pubdate>2025</pubdate>
        <title>National aquatic spatial data for current and near-future conditions for use in ecological models</title>
        <geoform>publication</geoform>
        <onlink>https://doi.org/10.5066/P14JDTTJ</onlink>
      </citeinfo>
    </crossref>
    <crossref>
      <citeinfo>
        <origin>Jeffrey T. Morisette</origin>
        <origin>Catherine S. Jarnevich</origin>
        <origin>Tracy R. Holcombe</origin>
        <origin>Colin B. Talbert</origin>
        <origin>Drew Ignizio</origin>
        <origin>Marian K. Talbert</origin>
        <origin>Claudio Silva</origin>
        <origin>David Koop</origin>
        <origin>Alan Swanson</origin>
        <origin>Nicholas E. Young</origin>
        <pubdate>20130125</pubdate>
        <title>VisTrails SAHM: visualization and workflow management for species habitat modeling</title>
        <geoform>publication</geoform>
        <serinfo>
          <sername>Ecography</sername>
          <issue>vol. 36, issue 2</issue>
        </serinfo>
        <pubinfo>
          <pubplace>n/a</pubplace>
          <publish>Wiley</publish>
        </pubinfo>
        <othercit>ppg. 129-135</othercit>
        <onlink>https://doi.org/10.1111/j.1600-0587.2012.07815.x</onlink>
      </citeinfo>
    </crossref>
    <crossref>
      <citeinfo>
        <origin>R Core Team</origin>
        <pubdate>2025</pubdate>
        <title>R: A Language and Environment for Statistical Computing</title>
        <geoform>application/service</geoform>
        <pubinfo>
          <pubplace>Vienna, Austria</pubplace>
          <publish>R Foundation for Statistical Computing</publish>
        </pubinfo>
        <onlink>https://www.R-project.org</onlink>
      </citeinfo>
    </crossref>
  </idinfo>
  <dataqual>
    <attracc>
      <attraccr>Before ensembling layers, we examined model outputs for quality. We ensured that the training and test Area Under the Curve (AUC) were greater than 0.7. We also visually assessed the complexity of response curves to identify overfitting. With evidence of overfitting based on those criteria, we explored alternative model-specific tuning parameters to decrease overfitting while maintaining good performance. We also ensured that the train and test Continuous Boyce Index (CBI) values were greater than 0.5. Model algorithms for each species that did not meet this threshold were assessed individually and when deemed appropriate, subsequently dropped. After layers were ensembled, maps were visualized for consistency and ecological plausibility. We used the hashtag method to split our data into non-random, spatially-sampled cross-validation and independent test data to evaluate model performance. This methodology uses a hashtag shape (#) overlaid on the 99% binary kernel density estimate (KDE) of the occurrence points. Observation points falling within the buffered hashtag are assigned to the test split (and withheld for model evaluation) with data desired ratio of 70% train/ 30% test.</attraccr>
    </attracc>
    <logic>We evaluated location data for accuracy including taxonomic and spatial coordinates. All known synonyms were collected (excluding subspecies, variants, and hybrids) using the Global Biodiversity Information System backbone  (ITIS; www.itis.gov) as an authoritative taxonomy in the R library ‘taxize’. We filtered observations by observation type (observation or specimen only), observation date (1980 to present), and coordinate uncertainty (≤ 30 m). Furthermore, we cleaned the species data and corresponding geographic coordinates by flagging potentially erroneous coordinates that are known common issues in biological collection databases using the R package “CoordinateCleaner”. This included removing records from within country capitals, within country centroids and within a 100m radius of known biodiversity institutions (which include research centers, universities, herbaria, museums, zoos, and botanical gardens). We also removed records with coordinates that fall into the oceans, coordinates with latitude and longitudes equal to zero, and species with equal latitude and longitude coordinates. We filtered to species that fell within the stream template for stream models and within the waterbody template for waterbody models. We checked the entire dataset for duplicate records. Locations compared with reported distributions.</logic>
    <complete>Data sets are considered complete for the information presented, as described in the abstract. Users are advised to read the rest of the metadata record carefully for additional details.</complete>
    <posacc>
      <horizpa>
        <horizpar>We filtered our dataset to coordinates with ≤ 30 m accuracy and visually screened our data for positional accuracy.</horizpar>
      </horizpa>
      <vertacc>
        <vertaccr>We filtered our dataset to coordinates with ≤ 30 m accuracy and visually screened our data for positional accuracy.</vertaccr>
      </vertacc>
    </posacc>
    <lineage>
      <srcinfo>
        <srccite>
          <citeinfo>
            <origin>U.S. Environmental Protection Agency (USEPA)</origin>
            <origin>U.S. Geological Survey</origin>
            <pubdate>2012</pubdate>
            <title>Hydrography Dataset Plus Version 2 High Resolution -NHDPlus HR</title>
            <geoform>vector digital data</geoform>
            <pubinfo>
              <pubplace>n/a</pubplace>
              <publish>US Geological Survey</publish>
            </pubinfo>
            <onlink>https://www.sciencebase.gov/catalog/item/56c38ad8e4b0946c6520aa52</onlink>
            <onlink>https://www.epa.gov/waterdata/get-nhdplus-national-hydrography-dataset-plus-data</onlink>
          </citeinfo>
        </srccite>
        <typesrc>Digital and/or Hardcopy</typesrc>
        <srctime>
          <timeinfo>
            <sngdate>
              <caldate>2012</caldate>
            </sngdate>
          </timeinfo>
          <srccurr>date of source data</srccurr>
        </srctime>
        <srccitea>NHDPlus HR</srccitea>
        <srccontr>geospatial extent of freshwater habitats and mean annual flow predictor</srccontr>
      </srcinfo>
      <srcinfo>
        <srccite>
          <citeinfo>
            <origin>The University of Georgia - Center for Invasive Species and Ecosystem Health</origin>
            <pubdate>20240319</pubdate>
            <title>Early Detection &amp; Distribution Mapping System (EDDMapS)</title>
            <geoform>tabular digital data</geoform>
            <othercit>EDDMapS. 2021. Early Detection &amp; Distribution Mapping System. The University of Georgia - Center for Invasive Species and Ecosystem Health. Available online at http://www.eddmaps.org/; last accessed March 19, 2024.</othercit>
            <onlink>http://www.eddmaps.org/</onlink>
          </citeinfo>
        </srccite>
        <typesrc>Digital and/or Hardcopy</typesrc>
        <srctime>
          <timeinfo>
            <rngdates>
              <begdate>19800101</begdate>
              <enddate>20201201</enddate>
            </rngdates>
          </timeinfo>
          <srccurr>observed</srccurr>
        </srctime>
        <srccitea>EDDMAPS</srccitea>
        <srccontr>Species location data</srccontr>
      </srcinfo>
      <srcinfo>
        <srccite>
          <citeinfo>
            <origin>Global Biodiversity Information Facility</origin>
            <pubdate>20240225</pubdate>
            <title>Global Biodiversity Information Facility (GBIF)</title>
            <geoform>tabular digital data</geoform>
            <othercit>Derived dataset GBIF.org, accessed from R via rgbif (https://github.com/ropensci/rgbif) on 2023-02-07 and 2024-03-19. Filtered export of GBIF occurrence data.</othercit>
            <onlink>https://doi.org/10.15468/dl.ckybbt</onlink>
            <onlink>https://doi.org/10.15468/dl.p2c8k2</onlink>
          </citeinfo>
        </srccite>
        <typesrc>Digital and/or Hardcopy</typesrc>
        <srctime>
          <timeinfo>
            <rngdates>
              <begdate>19800101</begdate>
              <enddate>20240225</enddate>
            </rngdates>
          </timeinfo>
          <srccurr>observed</srccurr>
        </srctime>
        <srccitea>GBIF</srccitea>
        <srccontr>Species location data</srccontr>
      </srcinfo>
      <srcinfo>
        <srccite>
          <citeinfo>
            <origin>USDI-BLM</origin>
            <pubdate>20210101</pubdate>
            <title>Assessment, Inventory, and Monitoring (AIM) Terrestrial Indicators Calculated Dataset (Lotic Indicators Hub and TerrADat Hub)</title>
            <geoform>tabular digital data</geoform>
            <onlink>https://gbp-blm-egis.hub.arcgis.com/pages/aim</onlink>
          </citeinfo>
        </srccite>
        <typesrc>Digital and/or Hardcopy</typesrc>
        <srctime>
          <timeinfo>
            <rngdates>
              <begdate>19800101</begdate>
              <enddate>20230314</enddate>
            </rngdates>
          </timeinfo>
          <srccurr>observed</srccurr>
        </srctime>
        <srccitea>AIM Lotic</srccitea>
        <srccontr>Species location data</srccontr>
      </srcinfo>
      <srcinfo>
        <srccite>
          <citeinfo>
            <origin>Laís Petri</origin>
            <origin>Evelyn M. Beaury</origin>
            <origin>Jeffrey Corbin</origin>
            <origin>Kristen Peach</origin>
            <origin>Helen Sofaer</origin>
            <origin>Ian S. Pearse</origin>
            <origin>Regan Early</origin>
            <origin>David T. Barnett</origin>
            <origin>Inés Ibáñez</origin>
            <origin>Robert K. Peet</origin>
            <origin>Michael Schafale</origin>
            <origin>Thomas R. Wentworth</origin>
            <origin>James P. Vanderhorst</origin>
            <origin>David N. Zaya</origin>
            <origin>Greg Spyreas</origin>
            <origin>Bethany A. Bradley</origin>
            <pubdate>20230112</pubdate>
            <title>SPCIS: Standardized Plant Community with Introduced Status database</title>
            <geoform>publication</geoform>
            <serinfo>
              <sername>Ecology</sername>
              <issue>vol. 104, issue 3</issue>
            </serinfo>
            <pubinfo>
              <pubplace>n/a</pubplace>
              <publish>Wiley</publish>
            </pubinfo>
            <onlink>https://doi.org/10.1002/ecy.3947</onlink>
          </citeinfo>
        </srccite>
        <typesrc>Digital and/or Hardcopy</typesrc>
        <srctime>
          <timeinfo>
            <rngdates>
              <begdate>19800101</begdate>
              <enddate>20230112</enddate>
            </rngdates>
          </timeinfo>
          <srccurr>observed</srccurr>
        </srctime>
        <srccitea>SPCIS</srccitea>
        <srccontr>Species location data</srccontr>
      </srcinfo>
      <srcinfo>
        <srccite>
          <citeinfo>
            <origin>NatureServe</origin>
            <pubdate>20230419</pubdate>
            <title>iMapInvasives: NatureServe’s online data system supporting strategic invasive species management</title>
            <geoform>tabular digital data</geoform>
            <onlink>http://www.imapinvasives.org</onlink>
          </citeinfo>
        </srccite>
        <typesrc>Digital and/or Hardcopy</typesrc>
        <srctime>
          <timeinfo>
            <rngdates>
              <begdate>19800101</begdate>
              <enddate>20230419</enddate>
            </rngdates>
          </timeinfo>
          <srccurr>observed</srccurr>
        </srctime>
        <srccitea>iMapInvasives</srccitea>
        <srccontr>Species location data</srccontr>
      </srcinfo>
      <srcinfo>
        <srccite>
          <citeinfo>
            <origin>LANDFIRE</origin>
            <origin>Earth Resources Observation and Science Center (EROS)</origin>
            <origin>U.S. Geological Survey</origin>
            <pubdate>20210805</pubdate>
            <title>LANDFIRE Remap 2016 LANDFIRE Reference Database (LFRDB)</title>
            <edition>LF Remap</edition>
            <geoform>raster digital data</geoform>
            <pubinfo>
              <pubplace>Sioux Falls, SD</pubplace>
              <publish>Earth Resources Observation and Science Center (EROS), U.S. Geological Survey</publish>
            </pubinfo>
            <onlink>https://www.landfire.gov</onlink>
          </citeinfo>
        </srccite>
        <typesrc>Digital and/or Hardcopy</typesrc>
        <srctime>
          <timeinfo>
            <sngdate>
              <caldate>2016</caldate>
            </sngdate>
          </timeinfo>
          <srccurr>ground condition</srccurr>
        </srctime>
        <srccitea>Landfire</srccitea>
        <srccontr>Species location data</srccontr>
      </srcinfo>
      <srcinfo>
        <srccite>
          <citeinfo>
            <origin>Amy J Benson</origin>
            <origin>Holly J Beck</origin>
            <origin>Matthew E Neilson</origin>
            <origin>Wesley M Daniel</origin>
            <origin>Aidan L Devantier</origin>
            <origin>Amanda Cousins</origin>
            <origin>Anastashia M King</origin>
            <origin>Mason Schermerhorn</origin>
            <origin>Shanamon Tangkulwarodom</origin>
            <pubdate>2023</pubdate>
            <title>Boat ramp locations in the United States of America</title>
            <geoform>dataset</geoform>
            <pubinfo>
              <pubplace>https://www.sciencebase.gov</pubplace>
              <publish>U.S. Geological Survey</publish>
            </pubinfo>
            <onlink>https://doi.org/10.5066/p9zirvf0</onlink>
          </citeinfo>
        </srccite>
        <typesrc>Digital and/or Hardcopy</typesrc>
        <srctime>
          <timeinfo>
            <sngdate>
              <caldate>20220531</caldate>
            </sngdate>
          </timeinfo>
          <srccurr>ground condition</srccurr>
        </srctime>
        <srccitea>Boatramps</srccitea>
        <srccontr>boatramp density pathway</srccontr>
      </srcinfo>
      <srcinfo>
        <srccite>
          <citeinfo>
            <origin>Grace Henderson</origin>
            <origin>Demetra A. Williams</origin>
            <origin>Keane S. Shadwell</origin>
            <origin>Cameron J. Reimer</origin>
            <origin>Shelby K. LeClare</origin>
            <origin>Linnea S. Fraser</origin>
            <origin>Peder S. Engelstad</origin>
            <origin>Catherine S. Jarnevich</origin>
            <pubdate>2025</pubdate>
            <title>Freshwater Habitat Environmental Predictor Layers for Waterbodies and Streams across the Contiguous United States</title>
            <geoform>raster digital data</geoform>
            <pubinfo>
              <pubplace>ScienceBase</pubplace>
              <publish>U.S. Geological Survey</publish>
            </pubinfo>
            <onlink>https://doi.org/10.5066/P14JDTTJ</onlink>
          </citeinfo>
        </srccite>
        <typesrc>Digital and/or Hardcopy</typesrc>
        <srctime>
          <timeinfo>
            <rngdates>
              <begdate>1971</begdate>
              <enddate>2059</enddate>
            </rngdates>
          </timeinfo>
          <srccurr>observed</srccurr>
        </srctime>
        <srccitea>Freshwater predictors</srccitea>
        <srccontr>Raster predictor layers</srccontr>
      </srcinfo>
      <srcinfo>
        <srccite>
          <citeinfo>
            <origin>U.S. Geological Survey</origin>
            <pubdate>20240408</pubdate>
            <title>Nonindigenous Aquatic Species Database</title>
            <geoform>tabular digital data</geoform>
            <pubinfo>
              <pubplace>Gainesville, FL</pubplace>
              <publish>U.S. Geological Survey</publish>
            </pubinfo>
            <onlink>http://nas.er.usgs.gov</onlink>
          </citeinfo>
        </srccite>
        <typesrc>Digital and/or Hardcopy</typesrc>
        <srctime>
          <timeinfo>
            <rngdates>
              <begdate>19800101</begdate>
              <enddate>20240408</enddate>
            </rngdates>
          </timeinfo>
          <srccurr>observed</srccurr>
        </srctime>
        <srccitea>NAS</srccitea>
        <srccontr>Occurrence data and pathways information</srccontr>
      </srcinfo>
      <srcinfo>
        <srccite>
          <citeinfo>
            <origin>Janet S Prevey</origin>
            <origin>Cameron J Reimer</origin>
            <origin>Peder Engelstad</origin>
            <origin>Pairsa N Belamaric</origin>
            <origin>Terri Hogan</origin>
            <origin>Jillian M. Laroe</origin>
            <origin>Colter J Mumford</origin>
            <origin>Jennifer L Sieracki</origin>
            <origin>Catherine S Jarnevich</origin>
            <pubdate>20250609</pubdate>
            <title>Spatial data layers for a site prioritization tool for invasive species</title>
            <geoform>Dataset</geoform>
            <pubinfo>
              <pubplace>https://www.sciencebase.gov</pubplace>
              <publish>U.S. Geological Survey</publish>
            </pubinfo>
            <onlink>https://doi.org/10.5066/p13mg455</onlink>
          </citeinfo>
        </srccite>
        <typesrc>Digital and/or Hardcopy</typesrc>
        <srctime>
          <timeinfo>
            <rngdates>
              <begdate>1981</begdate>
              <enddate>2023</enddate>
            </rngdates>
          </timeinfo>
          <srccurr>observed</srccurr>
        </srctime>
        <srccitea>human transport risk</srccitea>
        <srccontr>human_transport_risk raster layer for pathway summary</srccontr>
      </srcinfo>
      <srcinfo>
        <srccite>
          <citeinfo>
            <origin>Bethany Bradley</origin>
            <origin>Annette Evans</origin>
            <origin>Catherine S Jarnevich</origin>
            <origin>Evelyn Beaury</origin>
            <origin>Peder S Engelstad</origin>
            <origin>Nathan B Teich</origin>
            <origin>Jillian M Laroe</origin>
            <pubdate>20240426</pubdate>
            <title>US non-native plant occurrence and abundance data and distribution maps for Eastern US species with current and future climate</title>
            <geoform>Dataset</geoform>
            <pubinfo>
              <pubplace>https://www.sciencebase.gov</pubplace>
              <publish>U.S. Geological Survey</publish>
            </pubinfo>
            <onlink>https://doi.org/10.5066/p14vvres</onlink>
          </citeinfo>
        </srccite>
        <typesrc>Digital and/or Hardcopy</typesrc>
        <srctime>
          <timeinfo>
            <rngdates>
              <begdate>1980</begdate>
              <enddate>2023</enddate>
            </rngdates>
          </timeinfo>
          <srccurr>observed</srccurr>
        </srctime>
        <srccitea>Bradley et al.</srccitea>
        <srccontr>Occurrence data</srccontr>
      </srcinfo>
      <procstep>
        <procdesc>Species location data were acquired from online occurrence data repositories using data_pull R scripts. Sources included Nonindigenous Aquatic Species (NAS) database, Global Biodiversity Information Facility (GBIF), the Bureau of Land Management’s (BLM) Assessment, Inventory and Management (AIM) Lotic database, the Early Detection and Distribution Mapping System (EDDMapS),  the Standardized Plant Community with Introduced Status database (SPCIS), a plant abundance dataset from Bradley et al. 2024, iMapInvasives database, and the LANDFIRE reference database. AIM data were obtained directly from the BLM by request. See the Logical Consistency Report section for how species location data were further filtered and cleaned to ensure data quality using the data_prep R scripts. Occurrence data were spatially thinned to 1 location per 0.9 kilometer (km) to avoid issues of autocorrelation using the R package “enmSdmX” (Smith et al., 2023). These thinned data were then inputs to VisTrails Software for Assisted Habitat Modeling (SAHM, Morisette et al., 2013, https://doi.org/10.1111/j.1600-0587.2012.07815.x) workflows to create each habitat suitability layer.

Predictor layers were from Henderson et al. 2025 (Freshwater predictors) and include predictors capturing climate, topography, land use, hydrologic characteristics, and soils.</procdesc>
        <srcused>EDDMAPS</srcused>
        <srcused>GBIF</srcused>
        <srcused>AIM Lotic</srcused>
        <srcused>SPCIS</srcused>
        <srcused>iMapInvasives</srcused>
        <srcused>Landfire</srcused>
        <srcused>NAS</srcused>
        <srcused>Bradley et al.</srcused>
        <srcused>Freshwater predictors</srcused>
        <procdate>20240713</procdate>
      </procstep>
      <procstep>
        <procdesc>Each observation record was assigned to the stream template or waterbody template from Henderson et al. 2025 (Freshwater predictors) based on spatial location. Where enough data were available, we created a model for each habitat.  

We used the Target Background approach to create background points to use in modeling.  

Species distribution models (SDMs) were fit following modified methods from Jarnevich et al. (2024, https://doi.org/10.3897/neobiota.96.134842) using the SAHM (version 2.2.2, Morisette et al., 2013) in the VisTrails software. For species with &gt;143 spatially thinned occurrences, ~70% of occurrences were used to train the model and the other ~30% were withheld to evaluate the model. All models were fit using n-fold spatial cross validation (where n ranges from 3-10). The model algorithms included Random Forest (RF), Multiple Adaptive Regression Splines (MARS), Maxent (v 3.4.1), Generalized Linear Model (GLM), and Boosted Regression Trees (BRT). See Jarnevich et al. (2024) for additional details. SAHM produces several standard outputs for each model including assessment metrics for training and test data, rasters that map predicted habitat suitability across geographic space, metrics of predictor importance (change in Area Under the Curve (AUC) values when predictor values are permuted between presence and background locations), and response curves describing how relative habitat suitability varies across the range of values for each predictor.

Finally, we assessed model performance using the continuous Boyce index (CBI) calculated for each of the habitats per species to determine overall performance using the data_prep/boyce_functions.R script. We used the calculated CBI metric for each algorithm plus habitat to determine inclusion in the ensemble step (Step 3). Individual models with CBI values of less than 0.50 were assessed and dropped if deemed appropriate. See Species_model_information.csv for which algorithms were excluded from a species' ensemble map.</procdesc>
        <procdate>20250115</procdate>
      </procstep>
      <procstep>
        <procdesc>We created maps of continuous habitat suitability for each species, including applying to the five future scenarios (raster layers 1, 3). These maps represent a weighted ensemble of the algorithms used to generate the models (up to 5 occurrence maps), weighted according to their CBI values and calculated from model output maps using data_postprocessing R scripts. They represent habitat suitability on a 0-100 scale, where lower values represent less predicted suitability for that species and higher values represent greater predicted suitability. These maps were also restricted based on environmental conditions, meaning areas where environmental conditions are outside the environmental range of location points used to build the model are masked from the map (raster layers 2) calculated with data_postprocessing/mess-calc-function.R. For future scenarios we counted the number of future scenarios with environmental conditions outside the range (raster layer 7). We also calculated the maximum and minimum values calculated for each pixel by the five future scenarios and the standard deviation across future scenario predictions. 

1) Current occurrence suitability - Continuous value ensemble (XX-ens-current-mean .tif)
2) Restricted current occurrence suitability - Continuous value ensemble with restricted environmental conditions* (XX-ens-current-mean-masked.tif)
3) Future occurrence suitability - Continuous value ensemble (XX-ens-future-mean.tif)
4) Maximum future occurrence suitability - Maximum continuous value ensemble from the five alternative scenarios (XX-ens-future-max-gcm.tif)
5) Minimum future occurrence suitability - Minimum continuous value ensemble from the five alternative scenarios (XX-ens-future-min-gcm.tif) 
6) Standard deviation of future occurrence suitability - Standard deviation of continuous value from each algorithm for the five alternative scenarios (XX-ens-future-stdev.tif)
7) Restricted count -  Count of ensembles from each alternative scenario with restricted environmental conditions* (XX-gcm-mask-count.tif)

*Restricted environmental conditions = only display areas where environmental characteristics are inside the range of the values used to develop the model. For example, a location with a minimum winter temperature of 12 C would be outside the range of -10 to 10 C used in model development.</procdesc>
        <procdate>20250430</procdate>
      </procstep>
      <procstep>
        <procdesc>We created spatial layers using data_pathways R scripts representing potential pathways of movement using the included subwatershed vector layer (WBD_HUC12_EasternUS.gpkg) for a study area in the Northeast, including boat ramp density per subwatershed (pathway-boatramps-by-huc12.R), water area per subwatersheds (from waterbody template and river template in Freshwater predictors; pathway-perm-water-by-huc12.R), average subwatershed human transport risk (pathway-remoteness-by-huc12.R), anthropogenic connectivity across (HUC4) boundaries from features in the NHDPlus HR (pathway-anthropogenic-connectivity-by-huc-4.R), and water connectivity per species per subwatershed based on occurrences and direction of flow from NHDPlus HR (pathway-water-connectivity-by-huc12.R). The final scores were generated using composite-risk-score-species-by-huc12.R. The values for each pathway were rescaled between zero and one across subwatersheds. We then summed the pathway scores for each species, using pathways reported in Species_model_information.csv. We also calculated the average suitability score for current and future conditions for each species and subwatershed. The final risk score was the sum of the combined pathway score, the average current suitability score, and the average future suitability score for each HUC12.</procdesc>
        <srcused>NHDPlus HR</srcused>
        <srcused>Boatramps</srcused>
        <srcused>human transport risk</srcused>
        <procdate>20250430</procdate>
      </procstep>
    </lineage>
  </dataqual>
  <spdoinfo>
    <direct>Vector</direct>
    <ptvctinf>
      <sdtsterm>
        <sdtstype>G-polygon</sdtstype>
      </sdtsterm>
    </ptvctinf>
  </spdoinfo>
  <spref>
    <horizsys>
      <planar>
        <mapproj>
          <mapprojn>Albers Conical Equal Area</mapprojn>
          <albers>
            <stdparll>20.0</stdparll>
            <stdparll>60.0</stdparll>
            <longcm>-96.0</longcm>
            <latprjo>40.0</latprjo>
            <feast>0.0</feast>
            <fnorth>0.0</fnorth>
          </albers>
        </mapproj>
        <planci>
          <plance>row and column</plance>
          <coordrep>
            <absres>98.4693338923368</absres>
            <ordres>98.4693338923368</ordres>
          </coordrep>
          <plandu>meters</plandu>
        </planci>
      </planar>
      <geodetic>
        <horizdn>North_American_Datum_1983</horizdn>
        <ellips>GRS 1980</ellips>
        <semiaxis>6378137.0</semiaxis>
        <denflat>298.2572221010042</denflat>
      </geodetic>
    </horizsys>
  </spref>
  <eainfo>
    <detailed>
      <enttyp>
        <enttypl>Species_model_information.csv</enttypl>
        <enttypd>Comma Separated Value (CSV) file containing data.</enttypd>
        <enttypds>Producer Defined</enttypds>
      </enttyp>
      <attr>
        <attrlabl>scientific_name</attrlabl>
        <attrdef>Species identity flag in the format genus_species</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <udom>Scientific name for 159 species in the format genus_species</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>common_name</attrlabl>
        <attrdef>Species identity flag as common name associated with the scientific name in scientific_name</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <udom>Common name for species</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Taxa</attrlabl>
        <attrdef>Taxonomic group the species (scientific_name) belongs to</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>plant</edomv>
            <edomvd>Scientific name in Plantae Kingdom</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>fish</edomv>
            <edomvd>Scientific name classified as fish</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>invertebrate</edomv>
            <edomvd>Scientific name classified as invertebrate</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>area_thresh</attrlabl>
        <attrdef>Area threshold parameter used to define the hashtag shape that defined the test data area</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>Not applicable, indicated a hashtag shape was not used.</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>0.03</rdommin>
            <rdommax>18.0</rdommax>
            <attrunit>index</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>habitat_type</attrlabl>
        <attrdef>Type of freshwater habitat modeled</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>waterbody</edomv>
            <edomvd>Waterbody template used</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>stream</edomv>
            <edomvd>Stream template used</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Notes</attrlabl>
        <attrdef>Any species habitat type model notes related to model_changes or removal.</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>&lt;&lt; empty cell &gt;&gt;</edomv>
            <edomvd>No notes were needed</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <udom>Freeform text where needed to explain model_changes or removed_models.</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>train_pts</attrlabl>
        <attrdef>Number of training points used to fit the model  described in habitat_type</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>Not enough points</edomv>
            <edomvd>Not enough training points to fit a model</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>50</rdommin>
            <rdommax>12048</rdommax>
            <attrunit>Count</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>test_pts</attrlabl>
        <attrdef>Number of test points used to assess the model  described in habitat_type</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>Not applicable (no test point split)</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>61</rdommin>
            <rdommax>4438</rdommax>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>n_splits</attrlabl>
        <attrdef>Number of training data splits for the model</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>5</rdommin>
            <rdommax>10</rdommax>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>bg_pts</attrlabl>
        <attrdef>Number of background points for the model</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>857</rdommin>
            <rdommax>10000</rdommax>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>model_changes</attrlabl>
        <attrdef>Freeform text to describe any model algorithm changes for the species model group.</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>&lt;&lt; empty cell &gt;&gt;</edomv>
            <edomvd>No model changes were recorded</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <udom>Freeform text to describe any model algorithm specific changes for the species model group where brt is boosted regression tree, glm is generalized linear model, mars is multivariate adaptive regression spline, rf is random forest, and kde is kernel density estimate.</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>removed_algorithms</attrlabl>
        <attrdef>Freeform text list of model algorithms removed from the habitat_type</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <udom>List of model algorithms dropped from consideration in the final model ensemble for the specified species' habitat_type, where brt is boosted regression tree, glm is generalized linear model, mars is multivariate adaptive regression spline, rf is random forest, and kde is kernel density estimate.</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>min_cbi</attrlabl>
        <attrdef>Minimum continuous boyce index for any algorithm included in the ensemble</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>0.22</rdommin>
            <rdommax>0.97</rdommax>
            <attrunit>index</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>max_cbi</attrlabl>
        <attrdef>Maximum continuous boyce index for any algorithm included in the ensemble</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>0.63</rdommin>
            <rdommax>1.0</rdommax>
            <attrunit>index</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>ensemble_cbi</attrlabl>
        <attrdef>Continuous boyce index for the ensemble for the species model habitat type including CBI based on training data and with test data CBI in parenthesis.</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <udom>Continuous boyce index for the ensemble for the species model habitat type.</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>pathways</attrlabl>
        <attrdef>Freeform text highlighting the pathways applicable to the species</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No pathway information collected</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <udom>Freeform text highlighting the pathways applicable to the species</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>species_code</attrlabl>
        <attrdef>Code generate to represent the species</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <udom>Code generated based on the scientific_name to represent the species in file names</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>threshold</attrlabl>
        <attrdef>Calculated threshold values to create binary maps</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>Not applicable; no threshold values calculated for the species for the habitat type</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <udom>Freeform text of calculated first, fifth, and tenth threshold values</udom>
        </attrdomv>
      </attr>
    </detailed>
    <detailed>
      <enttyp>
        <enttypl>Merged_dataset.csv</enttypl>
        <enttypd>Comma Separated Value (CSV) file containing data.</enttypd>
        <enttypds>Producer Defined</enttypds>
      </enttyp>
      <attr>
        <attrlabl>scientific_name</attrlabl>
        <attrdef>Species identity flag in the format genus_species</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <udom>Scientific name for modeled species in the format genus_species</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>model_type</attrlabl>
        <attrdef>Type of freshwater habitat modeled</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>stream</edomv>
            <edomvd>Data to fit model using stream template</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>waterbody</edomv>
            <edomvd>Data to fit model using waterbody template</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>set</attrlabl>
        <attrdef>Identified if the data were used to fit the model (train) or assess the model (test)</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>train</edomv>
            <edomvd>Locations used to train the model</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>test</edomv>
            <edomvd>Locations used to assess (test) the model</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>X</attrlabl>
        <attrdef>X coordinate in Alber's Equal Area (ESRI:102008)</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>-2236332</rdommin>
            <rdommax>2124678</rdommax>
            <attrunit>meter</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Y</attrlabl>
        <attrdef>Y coordinate in Alber's Equal Area (ESRI:102008)</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>-1686461</rdommin>
            <rdommax>1328473</rdommax>
            <attrunit>meter</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Split</attrlabl>
        <attrdef>Spatial cross-validation split value assigned to the point</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>test</edomv>
            <edomvd>No split because set = test</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>1</rdommin>
            <rdommax>10</rdommax>
            <attrunit>index</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>responseBinary</attrlabl>
        <attrdef>Flag indicating if presence or background location</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>1</edomv>
            <edomvd>Occurrence location</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>-9998</edomv>
            <edomvd>Background location</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Annual_Mean_Temp</attrlabl>
        <attrdef>mean annual temperature [1981 to 2010]</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>2666</rdommin>
            <rdommax>2997</rdommax>
            <attrunit>degree C *100</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Mean_Temp_of_Warmest_Quarter</attrlabl>
        <attrdef>mean temperature of warmest quarter [1981 to 2010]</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>2783</rdommin>
            <rdommax>3116</rdommax>
            <attrunit>degree C * 100</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Mean_Temp_of_Coldest_Quarter</attrlabl>
        <attrdef>mean temperature of coldest quarter [1981 to 2010]</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>2571</rdommin>
            <rdommax>2949</rdommax>
            <attrunit>degree C * 100</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Annual_Precip</attrlabl>
        <attrdef>annual precipitation [1981 to 2010]</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>674</rdommin>
            <rdommax>47993</rdommax>
            <attrunit>kg m-2 year-1</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Precipitation_of_Wettest_Month</attrlabl>
        <attrdef>precipitation of wettest month [1981 to 2010]</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>99</rdommin>
            <rdommax>7483</rdommax>
            <attrunit>kg m-2 year-1</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Precipitation_of_Driest_Month</attrlabl>
        <attrdef>precipitation of driest month [1981 to 2010]</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>1</rdommin>
            <rdommax>1601</rdommax>
            <attrunit>kg m-2 year-1</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Precipitation_Seasonality</attrlabl>
        <attrdef>precipitation seasonality [1981 to 2010]</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>62</rdommin>
            <rdommax>1042</rdommax>
            <attrunit>kg m-2</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Precipitation_of_Wettest_Quarter</attrlabl>
        <attrdef>Mean precipitation of wettest quarter [1981 to 2010]</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>272</rdommin>
            <rdommax>22265</rdommax>
            <attrunit>kg m-2 month-1</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Precipitation_of_Driest_Quarter</attrlabl>
        <attrdef>mean precipitation of driest quarter [1981 to 2010]</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>22</rdommin>
            <rdommax>5438</rdommax>
            <attrunit>kg m-2 month-1</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Precipitation_of_Warmest_Quarter</attrlabl>
        <attrdef>mean precipitation of warmest quarter [1981 to 2010]</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>22</rdommin>
            <rdommax>7876</rdommax>
            <attrunit>kg m-2 month-1</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Precipitation_of_Coldest_Quarter</attrlabl>
        <attrdef>mean precipitation of coldest quarter [1981 to 2010]</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>181</rdommin>
            <rdommax>18487</rdommax>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Mean_Monthly_Temp_Range</attrlabl>
        <attrdef>mean diurnal range [1981 to 2010]</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>11</rdommin>
            <rdommax>173</rdommax>
            <attrunit>degree C * 10</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Isothermality</attrlabl>
        <attrdef>isothermality [1981 to 2010]</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>104</rdommin>
            <rdommax>590</rdommax>
            <attrunit>degree C * 10</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Temp_Seasonality</attrlabl>
        <attrdef>temperature seasonality [1981 to 2010]</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>1201</rdommin>
            <rdommax>12599</rdommax>
            <attrunit>degree C/100</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Max_Temp_of_Warmest_Month</attrlabl>
        <attrdef>maximum temperature of warmest month [1981 to 2010]</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>2875</rdommin>
            <rdommax>3172</rdommax>
            <attrunit>degree C * 100</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Min_Temp_of_Coldest_Month</attrlabl>
        <attrdef>minimum temperature of coldest month [1981 to 2010]</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>2490</rdommin>
            <rdommax>2933</rdommax>
            <attrunit>degree C/100</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Annual_Temp_Range</attrlabl>
        <attrdef>temperature annual range [1981 to 2010]</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>46</rdommin>
            <rdommax>456</rdommax>
            <attrunit>degree C * 10</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Agriculture_Land_Cover</attrlabl>
        <attrdef>Percent cropland in subwatershed [HUC12] in 2023</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>0</rdommin>
            <rdommax>100</rdommax>
            <attrunit>percent</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Developed_Land_Cover</attrlabl>
        <attrdef>Percent developed in subwatershed [HUC12] in 2023</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>0</rdommin>
            <rdommax>100</rdommax>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Forest_Land_Cover</attrlabl>
        <attrdef>Percent forest in subwatershed [HUC12] in 2023</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>0</rdommin>
            <rdommax>100</rdommax>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Grassland_Land_Cover</attrlabl>
        <attrdef>Percent grassland in subwatershed [HUC12] in 2023</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>0</rdommin>
            <rdommax>100</rdommax>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Mean_Annual_Stream_Flow</attrlabl>
        <attrdef>streamflow</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>0.0</rdommin>
            <rdommax>685478.88</rdommax>
            <attrunit>cubic feet per second</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Elevation</attrlabl>
        <attrdef>Elevation</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>-84.0</rdommin>
            <rdommax>4163.58</rdommax>
            <attrunit>meter</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Monthly_Growing_Season_ET</attrlabl>
        <attrdef>Mean monthly evapotranspiration of the growing season (April through October)</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>5</rdommin>
            <rdommax>184200</rdommax>
            <attrunit>millimeter</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Monthly_Summer_ET</attrlabl>
        <attrdef>Mean monthly evapotranspiration of the summer (June through August)</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>5</rdommin>
            <rdommax>96157</rdommax>
            <attrunit>millimeter</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Monthly_Spring_ET</attrlabl>
        <attrdef>Mean monthly evapotranspiration of spring months (March through June)</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>5</rdommin>
            <rdommax>101000</rdommax>
            <attrunit>millimeter</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Monthly_Early_Spring_ET</attrlabl>
        <attrdef>Mean monthly evapotranspiration of spring months (March through May)</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>5</rdommin>
            <rdommax>70500</rdommax>
            <attrunit>millimeter</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Fall_to_Spring_ET</attrlabl>
        <attrdef>Mean monthly evapotranspiration of fall to spring (October through June)</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>5</rdommin>
            <rdommax>170163</rdommax>
            <attrunit>millimeter</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Human_Modification_Index</attrlabl>
        <attrdef>global human modification index</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>0</rdommin>
            <rdommax>65108</rdommax>
            <attrunit>index</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>mTPI</attrlabl>
        <attrdef>multi-scale topographic position index</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>-316</rdommin>
            <rdommax>269</rdommax>
            <attrunit>index</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Median_NDMI</attrlabl>
        <attrdef>median normalized difference moisture index</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>-647</rdommin>
            <rdommax>934</rdommax>
            <attrunit>index</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>SD_NDMI</attrlabl>
        <attrdef>normalized difference moisture index standard deviation</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>0</rdommin>
            <rdommax>426</rdommax>
            <attrunit>standard deviation</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Potential_Water_Deficit</attrlabl>
        <attrdef>potential evapotranspiration</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>45.25</rdommin>
            <rdommax>197.66</rdommax>
            <attrunit>millimeter</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Shore_Density</attrlabl>
        <attrdef>shore length / waterbody area</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>0</rdommin>
            <rdommax>98.38</rdommax>
            <attrunit>m/m3</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Slope</attrlabl>
        <attrdef>slope</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>0</rdommin>
            <rdommax>90</rdommax>
            <attrunit>degree</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Soil_Bulk_Density</attrlabl>
        <attrdef>predicted soil bulk density at a 5cm depth (oven dry bulk density)</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>11.43</rdommin>
            <rdommax>198</rdommax>
            <attrunit>grams per cubic centimeters *100</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Soil_Calcium_Carbonate_Content</attrlabl>
        <attrdef>percent soil calcium carbonate at a 5cm depth</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>0.0</rdommin>
            <rdommax>78.0</rdommax>
            <attrunit>percent</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Soil_Clay_Content</attrlabl>
        <attrdef>predicted soil clay content at a 5cm depth</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>0.0</rdommin>
            <rdommax>80.77</rdommax>
            <attrunit>percent</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Soil_Potassium_Content</attrlabl>
        <attrdef>predicted soil k content at a 5cm depth</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>0</rdommin>
            <rdommax>294.43</rdommax>
            <attrunit>centimoles per kilogram</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Soil_Nitrogen_Content</attrlabl>
        <attrdef>predicted soil n content at a 5cm depth</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>0</rdommin>
            <rdommax>228.78</rdommax>
            <attrunit>percent * 100</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Soil_pH</attrlabl>
        <attrdef>predicted soil pH measured using 1:1 method (-log10([H+])) at a 5cm depth</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>340.9</rdommin>
            <rdommax>942.3</rdommax>
            <attrunit>ph * 100</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Soil_Depth</attrlabl>
        <attrdef>soil depth to any restrictive layer right censored at 201cm</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>3.0</rdommin>
            <rdommax>201.0</rdommax>
            <attrunit>centimeter</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Soil_Sand_Content</attrlabl>
        <attrdef>predicted soil sand content at a 5cm depth (total sand content)</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>0.0</rdommin>
            <rdommax>99.88</rdommax>
            <attrunit>percent</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Soil_Organic_Carbon_Content</attrlabl>
        <attrdef>predicted soil organic carbon content at a 5cm depth</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No Data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>7.0</rdommin>
            <rdommax>52617</rdommax>
            <attrunit>percent * 1000</attrunit>
          </rdom>
        </attrdomv>
      </attr>
    </detailed>
    <detailed>
      <enttyp>
        <enttypl>Assessment_metrics.csv</enttypl>
        <enttypd>Comma Separated Value (CSV) file containing data.</enttypd>
        <enttypds>Producer Defined</enttypds>
      </enttyp>
      <attr>
        <attrlabl>scientific_name</attrlabl>
        <attrdef>Species identity flag in the format genus_species</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <udom>Scientific name for species in the format genus_species</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>model_type</attrlabl>
        <attrdef>Type of freshwater habitat modeled</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>waterbody</edomv>
            <edomvd>model using waterbody template</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>stream</edomv>
            <edomvd>model using stream template</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>algorithm</attrlabl>
        <attrdef>Algorithm used to generate the model being assessed</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>BRT</edomv>
            <edomvd>Boosted regression tree</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>GLM</edomv>
            <edomvd>Generalized linear model</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>MARS</edomv>
            <edomvd>Multivariate adaptive regression splines</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>RF</edomv>
            <edomvd>Random forest</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>Maxent</edomv>
            <edomvd>Maxent</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>masked ensemble</edomv>
            <edomvd>Continuous value ensemble (of all algorithms) with restricted environmental conditions</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>unmasked ensemble</edomv>
            <edomvd>Continuous value ensemble (of all algorithms) with unrestricted environmental conditions</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>metric</attrlabl>
        <attrdef>Model evaluation metrics</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>AUC</edomv>
            <edomvd>Area Under the Curve - Receiver Operator Curve</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>AUC-PR</edomv>
            <edomvd>AUC - Precision Recall</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>correlation coefficient</edomv>
            <edomvd>correlation coefficient</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>percent correctly classified</edomv>
            <edomvd>percent correctly classified</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>sensitivity</edomv>
            <edomvd>sensitivity</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>kappa</edomv>
            <edomvd>Kappa Index</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>true skill statistic</edomv>
            <edomvd>true skill statistic</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>CBI</edomv>
            <edomvd>Continuous Boyce Index</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>type</attrlabl>
        <attrdef>Whether a metric was evaluated on test, training, or cross-validation data</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>train</edomv>
            <edomvd>training data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>cross-validation</edomv>
            <edomvd>average across cross-validation splits</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>test</edomv>
            <edomvd>test data</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>value</attrlabl>
        <attrdef>Value of model evaluation metric</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <udom>Possible value ranges differ based on the metric defined in the 'metric' column</udom>
        </attrdomv>
      </attr>
    </detailed>
    <detailed>
      <enttyp>
        <enttypl>Variable_importance.csv</enttypl>
        <enttypd>Comma Separated Value (CSV) file containing data.</enttypd>
        <enttypds>Producer Defined</enttypds>
      </enttyp>
      <attr>
        <attrlabl>scientific_name</attrlabl>
        <attrdef>Species identity flag in the format genus_species</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <udom>Scientific name for species in the format genus_species</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>model_type</attrlabl>
        <attrdef>Type of freshwater habitat modeled</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>waterbody</edomv>
            <edomvd>model using waterbody template</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>stream</edomv>
            <edomvd>model using stream template</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>algorithm</attrlabl>
        <attrdef>Algorithm used to generate the model the variable importance values are for</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>BRT</edomv>
            <edomvd>Boosted regression tree</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>GLM</edomv>
            <edomvd>Generalized linear model</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>MARS</edomv>
            <edomvd>Multivariate adaptive regression splines</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>RF</edomv>
            <edomvd>Random forest</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>Maxent</edomv>
            <edomvd>Maxent</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>predictor</attrlabl>
        <attrdef>Name of predictor that variable importance value is associated with for the model specified by the model_type and algorithm columns</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <udom>Predictor variables defined as columns in Merged_dataset</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>AUC_difference</attrlabl>
        <attrdef>Calculated change in AUC for the model when predictor values are permutated between presence and background</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>-0.944</rdommin>
            <rdommax>0.789</rdommax>
            <attrunit>area under the curve difference</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>split</attrlabl>
        <attrdef>cross-validation split number or train for all training data with which the variable importance values is associated</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <udom>Split number from 1 to 10 or train</udom>
        </attrdomv>
      </attr>
    </detailed>
    <detailed>
      <enttyp>
        <enttypl>Risk_and_pathway_scores.csv</enttypl>
        <enttypd>Comma Separated Value (CSV) file containing data.</enttypd>
        <enttypds>Producer Defined</enttypds>
      </enttyp>
      <attr>
        <attrlabl>huc12</attrlabl>
        <attrdef>Hydrologic Unit Code (huc) 12</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <codesetd>
            <codesetn>12 digit Hydrologic Unit Code (HUC)</codesetn>
            <codesets>Watershed Boundary Dataset, USGS</codesets>
          </codesetd>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>scientific_name</attrlabl>
        <attrdef>Species identity flag in the format genus_species</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>Data is applicable to the huc12 and not species specific</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <udom>Scientific name for species in the format genus species</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>pathway</attrlabl>
        <attrdef>Pathway that the value is for</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>combined pathway score</edomv>
            <edomvd>Sum of the normalized pathway scores (each scaled 0 to 1) by species (scientific_name) applicable to the specified species (scientific_name), renormalized summed scores (0 to 1)</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>current habitat</edomv>
            <edomvd>Mean current suitability for each species by huc12 from the raster 'SPECIES_ens-current-mean-masked', normalized by species across the study area (0 to 1)</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>future habitat</edomv>
            <edomvd>Mean future suitability for each species by huc12 from the raster 'SPECIES_ens-future-mean', normalized by species across the study area (0 to 1)</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>combined risk score</edomv>
            <edomvd>Sum of combined pathway score, current habitat, and future habitat with values ranging from 0 to 3.</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>water connectivity</edomv>
            <edomvd>Distance (meter) from closest known occurrence of the species to this huc12 along waterways within the NHDPlus HR to capture natural spread</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>boat ramp density</edomv>
            <edomvd>Number of boat ramps/ (area of waterbody template + area of stream template) to capture hitchhiking on boats/ trailer/ equipment</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>water area</edomv>
            <edomvd>Ratio of waterbody area (defined by waterbody template) to HUC12area to capture hitchhike on/ in waterfowl</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>human transport index</edomv>
            <edomvd>Sum relative human transport risk scores across the HUC12 weighted by area of the HUC12 to capture release or escape from captivity/water gardens/aquaculture/live bait</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>anthropogenic connectivity</edomv>
            <edomvd>Count of human-built water infrastructure (e.g., canals,  waterpipes) crossing HUC4 boundaries to capture spread via artificial connections</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>value</attrlabl>
        <attrdef>Value for the metric described by pathway</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <udom>Range of values depends on the pathway, either 0 to 1 or 0 to 3</udom>
        </attrdomv>
      </attr>
    </detailed>
    <detailed>
      <enttyp>
        <enttypl>WBD_HUC12_EasternUS.gpkg</enttypl>
        <enttypd>GeoPackage file with polygons representing 12-digit  hydrologic units used in analyses</enttypd>
        <enttypds>Producer Defined</enttypds>
      </enttyp>
      <attr>
        <attrlabl>huc12</attrlabl>
        <attrdef>Hydrologic Unit Code (huc) 12</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <codesetd>
            <codesetn>12 digit Hydrologic Unit Code (HUC)</codesetn>
            <codesets>Watershed Boundary Dataset, USGS</codesets>
          </codesetd>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>necasc</attrlabl>
        <attrdef>Flag identifying HUC12s within risk score study area</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>0</edomv>
            <edomvd>Not within risk score study area</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>1</edomv>
            <edomvd>Within risk score study area</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
      </attr>
    </detailed>
  </eainfo>
  <distinfo>
    <distrib>
      <cntinfo>
        <cntperp>
          <cntper>GS ScienceBase</cntper>
          <cntorg>U.S. Geological Survey</cntorg>
        </cntperp>
        <cntaddr>
          <addrtype>mailing address</addrtype>
          <address>Denver Federal Center, Building 810, Mail Stop 302</address>
          <city>Denver</city>
          <state>CO</state>
          <postal>80225</postal>
          <country>United States</country>
        </cntaddr>
        <cntvoice>1-888-275-8747</cntvoice>
        <cntemail>sciencebase@usgs.gov</cntemail>
      </cntinfo>
    </distrib>
    <distliab>Unless otherwise stated, all data, metadata and related materials are considered to satisfy the quality standards relative to the purpose for which the data were collected. Although these data and associated metadata have been reviewed for accuracy and completeness and approved for release by the U.S. Geological Survey (USGS), no warranty expressed or implied is made regarding the display or utility of the data for other purposes, nor on all computer systems, nor shall the act of distribution constitute any such warranty. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.</distliab>
    <stdorder>
      <digform>
        <digtinfo>
          <formname>Digital Data</formname>
        </digtinfo>
        <digtopt>
          <onlinopt>
            <computer>
              <networka>
                <networkr>https://doi.org/10.5066/P13JMOQW</networkr>
              </networka>
            </computer>
          </onlinopt>
        </digtopt>
      </digform>
      <fees>None</fees>
    </stdorder>
  </distinfo>
  <metainfo>
    <metd>20260123</metd>
    <metc>
      <cntinfo>
        <cntperp>
          <cntper>FORT Data Management</cntper>
          <cntorg>USGS Fort Collins Science Center</cntorg>
        </cntperp>
        <cntpos>FORT Data Management</cntpos>
        <cntaddr>
          <addrtype>mailing address</addrtype>
          <address>2150 Centre Avenue Bldg C</address>
          <city>Fort Collins</city>
          <state>CO</state>
          <postal>80526</postal>
          <country>US</country>
        </cntaddr>
        <cntvoice>970-226-9100</cntvoice>
        <cntemail>fortdatamanagement@usgs.gov</cntemail>
      </cntinfo>
    </metc>
    <metstdn>FGDC Biological Data Profile of the Content Standard for Digital Geospatial Metadata</metstdn>
    <metstdv>FGDC-STD-001.1-1999</metstdv>
  </metainfo>
</metadata>
