<?xml version='1.0' encoding='UTF-8'?>
<metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <idinfo>
    <citation>
      <citeinfo>
        <origin>Aaron W. Aunins</origin>
        <origin>Jing Zhang</origin>
        <origin>Qian Cong</origin>
        <origin>Nick V. Grishin</origin>
        <pubdate>20240703</pubdate>
        <title>SNP data for Karner blue butterfly samples from across the species range</title>
        <geoform>tabular digital data</geoform>
        <pubinfo>
          <pubplace>Reston, VA</pubplace>
          <publish>U.S. Geological Survey</publish>
        </pubinfo>
        <onlink>https://doi.org/10.5066/P143CSPQ</onlink>
        <lworkcit>
          <citeinfo>
            <origin>Aaron W. Aunins</origin>
            <origin>Jing Zhang</origin>
            <origin>Qian Cong</origin>
            <origin>Nick V. Grishin</origin>
            <pubdate>2024</pubdate>
            <title>Range-wide population genomic structure of the Karner blue butterfly, Plebejus (Lycaeides) samuelis</title>
            <geoform>publication</geoform>
          </citeinfo>
        </lworkcit>
      </citeinfo>
    </citation>
    <descript>
      <abstract>This dataset contains sample information about the origin of the butterflies sequenced, and their  corresponding SNP genotype data. These data were used to examine patterns of genomic structure of the Karner blue butterfly from throughout its range. The raw sequence data are archived in the GenBank Bioproject PRJNA995790 at: https://www.ncbi.nlm.nih.gov/bioproject/</abstract>
      <purpose>Data was collected to examine patterns of population genomic structure in the Karner blue butterfly.</purpose>
    </descript>
    <timeperd>
      <timeinfo>
        <rngdates>
          <begdate>20100701</begdate>
          <enddate>20120731</enddate>
        </rngdates>
      </timeinfo>
      <current>ground condition</current>
    </timeperd>
    <status>
      <progress>Complete</progress>
      <update>None planned</update>
    </status>
    <spdom>
      <descgeog>World</descgeog>
      <bounding>
        <westbc>-107.9297</westbc>
        <eastbc>-68.5547</eastbc>
        <northbc>48.9225</northbc>
        <southbc>29.2289</southbc>
      </bounding>
    </spdom>
    <keywords>
      <theme>
        <themekt>ISO 19115 Topic Category</themekt>
        <themekey>biota</themekey>
      </theme>
      <theme>
        <themekt>None</themekt>
        <themekey>Karner blue butterfly</themekey>
      </theme>
      <theme>
        <themekt>USGS Metadata Identifier</themekt>
        <themekey>USGS:661ea223d34e7eb9eb7e3d0a</themekey>
      </theme>
    </keywords>
    <taxonomy>
      <keywtax>
        <taxonkt>None</taxonkt>
        <taxonkey>Lycaeides melissa samuelis</taxonkey>
      </keywtax>
      <taxoncl>
        <taxonrn>Kingdom</taxonrn>
        <taxonrv>Animalia</taxonrv>
        <common>animals</common>
        <taxoncl>
          <taxonrn>Subkingdom</taxonrn>
          <taxonrv>Bilateria</taxonrv>
          <common>triploblasts</common>
          <taxoncl>
            <taxonrn>Infrakingdom</taxonrn>
            <taxonrv>Protostomia</taxonrv>
            <taxoncl>
              <taxonrn>Superphylum</taxonrn>
              <taxonrv>Ecdysozoa</taxonrv>
              <taxoncl>
                <taxonrn>Phylum</taxonrn>
                <taxonrv>Arthropoda</taxonrv>
                <common>arthropods</common>
                <taxoncl>
                  <taxonrn>Subphylum</taxonrn>
                  <taxonrv>Hexapoda</taxonrv>
                  <common>hexapods</common>
                  <taxoncl>
                    <taxonrn>Class</taxonrn>
                    <taxonrv>Insecta</taxonrv>
                    <common>insects</common>
                    <taxoncl>
                      <taxonrn>Subclass</taxonrn>
                      <taxonrv>Pterygota</taxonrv>
                      <common>winged insects</common>
                      <taxoncl>
                        <taxonrn>Infraclass</taxonrn>
                        <taxonrv>Neoptera</taxonrv>
                        <common>modern, wing-folding insects</common>
                        <taxoncl>
                          <taxonrn>Superorder</taxonrn>
                          <taxonrv>Holometabola</taxonrv>
                          <taxoncl>
                            <taxonrn>Order</taxonrn>
                            <taxonrv>Lepidoptera</taxonrv>
                            <common>butterflies</common>
                            <common>moths</common>
                            <taxoncl>
                              <taxonrn>Superfamily</taxonrn>
                              <taxonrv>Papilionoidea</taxonrv>
                              <common>butterflies</common>
                              <taxoncl>
                                <taxonrn>Family</taxonrn>
                                <taxonrv>Lycaenidae</taxonrv>
                                <common>blues</common>
                                <common>coppers</common>
                                <common>gossamer-winged butterflies</common>
                                <common>hairstreaks</common>
                                <common>harvesters</common>
                                <common>Gossamer-wing Butterflies</common>
                                <taxoncl>
                                  <taxonrn>Subfamily</taxonrn>
                                  <taxonrv>Polyommatinae</taxonrv>
                                  <common>Blues</common>
                                  <taxoncl>
                                    <taxonrn>Tribe</taxonrn>
                                    <taxonrv>Polyommatini</taxonrv>
                                    <taxoncl>
                                      <taxonrn>Genus</taxonrn>
                                      <taxonrv>Plebejus</taxonrv>
                                      <taxoncl>
                                        <taxonrn>Subgenus</taxonrn>
                                        <taxonrv>Plebejus (Lycaeides)</taxonrv>
                                        <taxoncl>
                                          <taxonrn>Species</taxonrn>
                                          <taxonrv>Plebejus melissa</taxonrv>
                                          <taxoncl>
                                            <taxonrn>Subspecies</taxonrn>
                                            <taxonrv>Plebejus melissa samuelis</taxonrv>
                                            <common>Karner blue</common>
                                          </taxoncl>
                                        </taxoncl>
                                      </taxoncl>
                                    </taxoncl>
                                  </taxoncl>
                                </taxoncl>
                              </taxoncl>
                            </taxoncl>
                          </taxoncl>
                        </taxoncl>
                      </taxoncl>
                    </taxoncl>
                  </taxoncl>
                </taxoncl>
              </taxoncl>
            </taxoncl>
          </taxoncl>
        </taxoncl>
      </taxoncl>
    </taxonomy>
    <accconst>None.  Please see 'Distribution Info' for details.</accconst>
    <useconst>None.  Users are advised to read the dataset's metadata thoroughly to understand appropriate use and data limitations.</useconst>
    <ptcontac>
      <cntinfo>
        <cntperp>
          <cntper>Aaron W. Aunins</cntper>
          <cntorg>U.S. Geological Survey Eastern Ecological Science Center</cntorg>
        </cntperp>
        <cntpos>Research Fish Biologist</cntpos>
        <cntaddr>
          <addrtype>mailing</addrtype>
          <address>11649 Leetown Road</address>
          <city>Kearneysville</city>
          <state>WV</state>
          <postal>25430</postal>
          <country>United States</country>
        </cntaddr>
        <cntvoice>304-724-4480</cntvoice>
        <cntemail>aaunins@usgs.gov</cntemail>
      </cntinfo>
    </ptcontac>
    <datacred>Jing Zhang (jingzhang.first@gmail.com), Qian Cong (qian.cong@utsouthwestern.edu), and Nick Grishin (grishin@chop.swmed.edu): UT Southwestern Medical Center, Dallas, TX.</datacred>
  </idinfo>
  <dataqual>
    <attracc>
      <attraccr>No formal attribute accuracy tests were conducted.</attraccr>
    </attracc>
    <logic>No formal logical accuracy tests were conducted.</logic>
    <complete>Data set is considered complete for the information presented, as described in the abstract. Users are advised to read the rest of the metadata record carefully for additional details.</complete>
    <posacc>
      <horizpa>
        <horizpar>A formal accuracy assessment of the horizontal positional information in the data set has not been conducted.</horizpar>
      </horizpa>
      <vertacc>
        <vertaccr>A formal accuracy assessment of the vertical positional information in the data set has either not been conducted or is not applicable.</vertaccr>
      </vertacc>
    </posacc>
    <lineage>
      <procstep>
        <procdesc>Sample collection and DNA extraction:
Karner blue butterflies were non-lethally sampled for genetic analysis from four states in 2010, 2011, and 2012. Samples consisted of wing-clips (approximately 3 mm2 in surface area) from the hind wing with an effort to include part of the anal vein, which were immediately placed in 180 µLl of cell lysis buffer (Qiagen Cat. No. 158116). Samples were shipped on ice to the USGS Eastern Ecological Science Center at the Leetown Research Laboratory (USGS-EESC-LSC), Kearneysville, WV. Samples were stored at -20 ºC or -80 ºC until DNA extraction. Genomic DNA was extracted using either Qiagen Puregene (Qiagen Cat. No. 158023), Qiagen DNEasy (Qiagen Cat. No. 69504), or Macherey-Nagel NucleoSpin Tissue (Macherey-Nagel Cat. No. 740952.50) kits following the manufacturer’s instructions. DNA was quantified using a Qubit dsDNA HS Assay kit (ThermoFisher Cat. No. Q32851). Pair-end libraries were then prepared using NEBNext Ultra II FS DNA Library Prep kit  (New England Biolabs Cat. No. E7805L) and sequenced on an Illumina HiSeq X10 system. Genomic data of other previously sequenced species, including P. anna, P. fridayi, P. idas, and P. melissa were included in this study for comparison to the Karner blue, most of which were included in Zhang et al. (2023). All Karner blue sequence reads as well as those from other species can be retrieved from NCBI BioProject PRJNA995790 and corresponding BioSample accessions in the accompanying metadata file.  

Reference genome construction: 

DNA was extracted from a female P. melissa specimen collected from Laramie County, Wyoming in 2021 (NCBI BioSample SAMN37522558). An Oxford Nanopore library was prepared using the LSK110 Ligation kit (Oxford Nanopore Cat. No. SQK-LSK110). The concentration of the library was measured using Qubit, and around 20 fmol was loaded onto a flow cell R9.4.1. We performed base calling using Guppy, and assembled draft genomes using NextDenovo (https://github.com/Nextomics/NextDenovo) and NECAT (Chen et al., 2021a), respectively with the default settings. We used the homology-based annotation method, GeMoMa (Keilwagen et al., 2019) with the default settings, along with the annotation of the P. argus draft genome, to annotate the draft assembly of P. melissa. We predicted function for proteins encoded in the P. melissa genome by mapping these proteins to FlyBase (Gramates et al., 2022) with BLASTP (Altschul et al., 1997). Due to high conservation of Z-chromosome in Lepidoptera (Fraisse et al., 2017), we identified possible Z-linked scaffolds by mapping Heliconius Z-linked proteins to our draft genome using TBLASTN (Altschul et al., 1997). The draft P. melissa genome is deposited in NCBI with accession number JBBPCG000000000. 

SNP calling: 
We performed SNP calling using our established pipeline described in Cong et al. (2021). Briefly, the adapters and low-quality portion of the sequencing reads were trimmed using Trimmomatic-0.36 (Bolger et al., 2014), and overlapping read pairs were merged using PEAR and the default settings (Stamatakis et al., 2014). The resulting reads were mapped to the P. melissa assembly described above using BWA-MEM (Li, 2013), and we kept the reads that were mapped unambiguously in the correct orientation. Using the BWA-MEM results, we computed the total sequencing depth in each 100 bp window. We considered windows with too high or too low total depth to be less confident. For example, windows with high depth might cover repetitive regions whereas windows with very low depth may be misassembled or highly variable. Therefore, we only used the segments that contain at least three consecutive windows (&gt;=300bp) with depth in between 0.25 to 2.5 times of the median sequencing depth and reads mapped to genomic regions other than these regions were discarded. In total, we obtained 331,230,775 “good positions”, around 79% of assembly size of P. melissa, our reference. We performed SNP calling with samtools (Li et al., 2009) for each specimen using alignments between reference and reads after the two cleaning-up protocols.</procdesc>
        <procdate>20230101</procdate>
      </procstep>
    </lineage>
  </dataqual>
  <eainfo>
    <detailed>
      <enttyp>
        <enttypl>accession_numbers.csv</enttypl>
        <enttypd>The file "accession_numbers" links samples to accessions in Genbank.</enttypd>
        <enttypds>Producer Defined</enttypds>
      </enttyp>
      <attr>
        <attrlabl>Sample_identifier</attrlabl>
        <attrdef>This identification number represents each individual that was sequenced.</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <udom>Free text</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Species</attrlabl>
        <attrdef>Current species and subspecies names applied to the specimens.</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <udom>Free text</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Collection_Date</attrlabl>
        <attrdef>Dates of specimen collection. Formatted as day-month-year, month-year, or unknown.</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>unknown</edomv>
            <edomvd>Date of specimen collection is unknown.</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <rdom>
            <rdommin>Sept-1899</rdommin>
            <rdommax>24-May-21</rdommax>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Country</attrlabl>
        <attrdef>Country of origin</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>USA</edomv>
            <edomvd>Country of United States of America</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>Canada</edomv>
            <edomvd>Country of Canada</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>Mexico</edomv>
            <edomvd>Country of Mexico</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>State_or_Province</attrlabl>
        <attrdef>State abbreviation or province of collection.</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>unknown</edomv>
            <edomvd>State or province of collection is unknown.</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <udom>2-letter state abbreviation or full name of province</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>County</attrlabl>
        <attrdef>County name of collection if applicable.</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>County is unknown or not applicable.</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <udom>Free text</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Site_name</attrlabl>
        <attrdef>Name of site of collection when known.</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NA</edomv>
            <edomvd>No site collection name is available or was assigned at collection.</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <udom>Free text</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>BioProject ID</attrlabl>
        <attrdef>NCBI BioProject ID</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>PRJNA995790</edomv>
            <edomvd>NCBI BioProject ID</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Biosample_ID</attrlabl>
        <attrdef>NCBI Biosample ID</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <udom>Free text</udom>
        </attrdomv>
      </attr>
    </detailed>
    <detailed>
      <enttyp>
        <enttypl>snpcall_result.csv</enttypl>
        <enttypd>The file "snpcall_result" lists the 64 data files and their location within 4 zipped archive files. The 64 files listed as "all_sample_snpcall_results_X"  (where X is 0-63) represent SNP calls of each butterfly sequenced in this study relative to scaffolds of the reference genome of Plebejus melissa (Bioproject PRJNA995790 and Biosample SAMN37522497). The identifier of each butterfly specimen can be found in the header and the first and second columns are scaffold name and nucleotide position on the scaffold, respectively. Each file includes several scaffolds of the reference genome. The complete SNPs across all scaffolds can be obtained by merging the files for subsequent analyses.</enttypd>
        <enttypds>Producer Defined</enttypds>
      </enttyp>
      <attr>
        <attrlabl>filename</attrlabl>
        <attrdef>List of SNP filenames in .txt. format</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <udom>Filenames follow the format of "all_sample_snpcall_result_X.txt", where X ranges from 0-63</udom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>compressed_archive</attrlabl>
        <attrdef>Compressed archive files of snpcall_result files in .7z format</attrdef>
        <attrdefs>Producer Defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>snpcall_result_0-15.7z</edomv>
            <edomvd>Contains snpcall_result files from 0-15</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>snpcall_result_16-30.7z</edomv>
            <edomvd>Contains snpcall_result files from 16-30</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>snpcall_result_31-50.7z</edomv>
            <edomvd>Contains snpcall_result files from 31-50</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>snpcall_result_51-63.7z</edomv>
            <edomvd>Contains snpcall_result files from 51-63</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
      </attr>
    </detailed>
  </eainfo>
  <distinfo>
    <distrib>
      <cntinfo>
        <cntperp>
          <cntper>GS ScienceBase</cntper>
          <cntorg>U.S. Geological Survey</cntorg>
        </cntperp>
        <cntaddr>
          <addrtype>mailing address</addrtype>
          <address>Denver Federal Center, Building 810, Mail Stop 302</address>
          <city>Denver</city>
          <state>CO</state>
          <postal>80225</postal>
          <country>United States</country>
        </cntaddr>
        <cntvoice>1-888-275-8747</cntvoice>
        <cntemail>sciencebase@usgs.gov</cntemail>
      </cntinfo>
    </distrib>
    <distliab>Unless otherwise stated, all data, metadata and related materials are considered to satisfy the quality standards relative to the purpose for which the data were collected. Although these data and associated metadata have been reviewed for accuracy and completeness and approved for release by the U.S. Geological Survey (USGS), no warranty expressed or implied is made regarding the display or utility of the data on any other system or for general or scientific purposes, nor shall the act of distribution constitute any such warranty.</distliab>
    <stdorder>
      <digform>
        <digtinfo>
          <formname>Digital Data</formname>
        </digtinfo>
        <digtopt>
          <onlinopt>
            <computer>
              <networka>
                <networkr>https://doi.org/10.5066/P143CSPQ</networkr>
              </networka>
            </computer>
          </onlinopt>
        </digtopt>
      </digform>
      <fees>None</fees>
    </stdorder>
  </distinfo>
  <metainfo>
    <metd>20240703</metd>
    <metc>
      <cntinfo>
        <cntperp>
          <cntper>Aaron Aunins</cntper>
          <cntorg>U.S. Geological Survey, Eastern Ecological Science Center</cntorg>
        </cntperp>
        <cntpos>Research Fish Biologist</cntpos>
        <cntaddr>
          <addrtype>mailing and physical</addrtype>
          <address>11649 Leetown Road</address>
          <city>Kearneysville</city>
          <state>WV</state>
          <postal>25403</postal>
          <country>USA</country>
        </cntaddr>
        <cntvoice>304-724-4480</cntvoice>
        <cntemail>aaunins@usgs.gov</cntemail>
      </cntinfo>
    </metc>
    <metstdn>FGDC Biological Data Profile of the Content Standard for Digital Geospatial Metadata</metstdn>
    <metstdv>FGDC-STD-001.1-1999</metstdv>
  </metainfo>
</metadata>
