Simulated eDNA Occurrence data and Stan summaries of data

Richard A Erickson 20191003 Simulated eDNA Occurrence data and Stan summaries of data tabular digital data Online U.S. Geological Survey https://doi.org/10.5066/P9WRFUDQ Richard A Erickson Christopher M. Merkes Erica L. Mize 20190901 Sampling designs for landscape-level eDNA monitoring programs using three-level occurrence models publication Integrated Environmental Assessment and Management Wiley https://doi.org/10.1002/ieam.4155 Resource managers conduct landscape-level monitoring using environmental DNA (eDNA). These managers must contend with imperfect detection in samples and sub-samples (i.e., molecular analyses). This imperfect detection impacts their ability to both detect species and estimate occurrence. Although occurrence (synonymously occupancy) models can estimate these probabilities, most models and guidance for their application do not consider three levels. This simulated dataset assumes sites are occupied (probably psi =1, Z = 1 ) and simulates sample (probability theta, A = 0,1) and subsample (probability p, Y = 0, 1) occurrence probabilies and detections (1)/non-detections (0). These data were simulated to evaluate the ability of a statistical model to recover known parameter values. Simulated data 20171219 See Supplemental Info Complete None planned -180.0 180.0 90.0 -90.0 World ISO 19115 Topic Category biota None Marine Realms Information Bank (MRIB) keywords numerical modeling USGS Thesaurus environmental DNA USGS Metadata Identifier USGS:5d96239be4b0c4f70d110ebe None. Please see 'Distribution Info' for details. None. Users are advised to read the data set's metadata thoroughly to understand appropriate use and data limitations. Richard A Erickson U.S. Geological Survey, Midwest Region Fish Biologist mailing address

2630 Fanta Reed Road

La Crosse WI 54603 United States 608-781-6353 rerickson@usgs.gov USFWS Code producing outputs files is undergoing release as another product No formal logical accuracy tests were conducted. Data set is considered complete for the information presented, as described in the abstract. Users are advised to read the rest of the metadata record carefully for additional details. A formal accuracy assessment of the horizontal positional information in the data set has not been conducted. A formal accuracy assessment of the vertical positional information in the data set has either not been conducted, or is not applicable. These data were simulated using R to simulate the following equations: Ai,j|Zi ~ Bernoulli( Zi i,j) for sample detections and Yi,j,k|Ai,j ~ Bernoulli( Ai,j pi,j) for subsample detections. A, Z, Y, theta, and p are defined in the abstract. i is the site level, j is the same level, and k is the subsample. In total, 800 different parameter combinations were used to simulate 100 realizations of each combination for a total of 800 simulated data sets. The specific parameter combinations used are listed in the parameterValues.csv 20171219 stan Summary data CSVs Comma Separated Value (CSV) file containing data. The files were named for a computer language that starts with 0 and the indexes were named for a computer language that starts with 1. Each parameter (p, theta, psi) has a “recovered” point estimate, Monte Carlo standard error, and lower and upper bounds to a 95% credibility interval. NAs indicate that the model could not fit a simulate dataset because there were no detections in the simulated data. Producer defined ParameterIndex which is the parameter combination used for the file and corresponds to the file name +1. The files were named for a computer language that starts with 0 and the indexes were named for a computer language that starts with 1. Producer defined 1 Producer defined pRecovered which is the recovered point estimate for parameter p Producer defined 0 1 thetaRecovered which is the recovered point estimate for parameter theta Producer defined 0 1 psiRecovered which is the recovered point estimate for parameter psi Producer defined 0 1 pRecoveredSE which is the Monte Carlo Standard Error for parameter p Producer defined 0 1 thetaRecoveredSE which is the Monte Carlo Standard Error for parameter theta Producer defined 0 1 psiRecoveredSE which is the Monte Carlo Standard Error for parameter psi Producer defined 0 1 pRecoveredLower which is the lower bound of the 95% Credibility Interval for p Producer defined 0 1 thetaRecoveredLower which is the lower bound of the 95% Credibility Interval for theta Producer defined 0 1 psiRecoveredLower which is the lower bound of the 95% Credibility Interval for psi Producer defined 0 1 pRecoveredUpper which is the upper bound of the 95% Credibility Interval for p Producer defined 0 1 thetaRecoveredUpper which is the upper bound of the 95% Credibility Interval for theta Producer defined 0 1 psiRecoveredUpper which is the upper bound of the 95% Credibility Interval for psi Producer defined 0 1 simulated Data csv Comma Separated Value (CSV) file containing data. The files were named for a computer language that starts with 0 and the indexes were named for a computer language that starts with 1. Producer defined parameterIndex which is the parameter combination used for the file and corresponds to the file name +1. The files were named for a computer language that starts with 0 and the indexes were named for a computer language that starts with 1. Producer defined 1 Producer defined Zindex Zindex is an index for sites. Currently, this is fixed at one, but could be changed if we simulated more sites per sample.. Producer defined 1 Producer defined Aindex Aindex is the sample index within each site. If there were more than 1 site in the simulated dataset, this value would not be unique. Producer defined 1 5 Yindex Yindex is the observation index and is unique for each row. Producer defined 1 5 Y_* is the simulated Y value for * replicate Producer defined is the simulated Y value for * replicate. Zeros are non-detects and ones are detects. A_* is the simulated A value for * replicate Producer defined is the simulated A value for * replicate. Zeros are non-detects and ones are detects. Z_* is the simulated Z value for * replicate Producer defined is the simulated Z value for * replicate. Zeros are non-detects and ones are detects. U.S. Geological Survey - ScienceBase mailing and physical

Denver Federal Center, Building 810, Mail Stop 302

Denver CO 80225 USA 1-888-275-8747 sciencebase@usgs.gov Unless otherwise stated, all data, metadata and related materials are considered to satisfy the quality standards relative to the purpose for which the data were collected. Although these data and associated metadata have been reviewed for accuracy and completeness and approved for release by the U.S. Geological Survey (USGS), no warranty expressed or implied is made regarding the display or utility of the data on any other system or for general or scientific purposes, nor shall the act of distribution constitute any such warranty. 20210523 Richard A Erickson U.S. Geological Survey, Midwest Region Fish Biologist mailing address

2630 Fanta Reed Road

La Crosse WI 54603 United States 608-781-6353 rerickson@usgs.gov FGDC Biological Data Profile of the CDGSM FGDC-STD-001.1-1999 Record created using USGS Metadata Wizard tool. (https://github.com/usgs/fort-pymdwizard)