<?xml version='1.0' encoding='UTF-8'?>
<metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <idinfo>
    <citation>
      <citeinfo>
        <origin>Biplab Poudel</origin>
        <origin>Jiacheng Xie</origin>
        <origin>Congyu Guo</origin>
        <origin>Dong Xu</origin>
        <origin>Rishi J. Patel</origin>
        <origin>Olivia Watt</origin>
        <origin>Erin Pulster</origin>
        <origin>Jeffery A. Steevens</origin>
        <pubdate>20250811</pubdate>
        <title>Images of two standard crude oils collected using a fluorescent camera device to train and optimize a machine learning model for real-time oil spill concentration assessment collected from November 7, 2023, to July 8, 2024</title>
        <geoform>tabular digital data</geoform>
        <pubinfo>
          <pubplace>Reston, VA</pubplace>
          <publish>U.S. Geological Survey</publish>
        </pubinfo>
        <othercit>Authors Open Researcher and Contributor Id (ORCID) are as follows: Biplab Poudel: 0009-0006-8636-1449; Jiacheng Xie: 0000-0003-3733-4349; Congyu Guo: 0009-0003-9939-6788; Dong Xu: 0000-0002-4809-0514; Rishi J. Patel: 0009-0008-8135-2177; Olivia Watt: 0000-0001-8110-3551; Erin Pulster: 0000-0003-4574-8613; Jeffery A. Steevens: 0000-0003-3946-1229</othercit>
        <onlink>https://doi.org/10.5066/P1SXVZX2</onlink>
        <lworkcit>
          <citeinfo>
            <origin>Biplab Poudel</origin>
            <origin>Jiacheng Xie</origin>
            <origin>Congyu Guo</origin>
            <origin>Olivia Watt</origin>
            <origin>Erin Pulster</origin>
            <origin>Rishi J. Patel</origin>
            <origin>Jeffery Steevens</origin>
            <origin>Dong Xu</origin>
            <pubdate>20250701</pubdate>
            <title>Real-Time Oil Spill Concentration Assessment through Fluorescence Imaging and Deep Learning</title>
            <geoform>publication</geoform>
            <pubinfo>
              <pubplace>n/a</pubplace>
              <publish>Elsevier BV</publish>
            </pubinfo>
            <othercit>ppg. 139374</othercit>
            <onlink>https://doi.org/10.1016/j.jhazmat.2025.139374</onlink>
          </citeinfo>
        </lworkcit>
      </citeinfo>
    </citation>
    <descript>
      <abstract>The data are a set of fluorescent images that were generated to support the development of a machine learning model.  The approach combines fluorescence imaging, deep learning, a mobile application, and a data management system for automated and real-time oil spill assessment. The dataset is comprised of 1,530 fluorescence images from two distinct oil types, a napthalenic crude oil (NACO) and an aromatic-napthalenic crude oil (ANCO). The oil is diluted in hexane and the images represent concentrations ranging from 0 to 500 mg/L. The data are presented as JPEG files in two zip folders (one for each oil type) as well as a CSV file that describes the type and concentration of the oil photographed in each image. These images were used to train and evaluate a machine learning tool comprised of convolutional neural network architecture for feature extraction coupled with a custom regression model. Model description and code can be found at https://github.com/biplabpoudel25/Oil-spill-estimation.</abstract>
      <purpose>The images were collected to support the development of a machine learning model that can estimate the concentration of oil in water, sediment, and soil during an oil spill. We aim to use an application-specific device for field measurement of oil during a spill. The images are captured with the device and then analyzed using the machine learning model that interprets the samples. These images can be used to test and further develop machine learning models for oil analysis. The tool and machine learning model enables rapid, cost-effective field measurements with robust data tracking and analysis capabilities.</purpose>
    </descript>
    <timeperd>
      <timeinfo>
        <rngdates>
          <begdate>20231107</begdate>
          <enddate>20240708</enddate>
        </rngdates>
      </timeinfo>
      <current>observed</current>
    </timeperd>
    <status>
      <progress>Complete</progress>
      <update>None planned</update>
    </status>
    <spdom>
      <descgeog>Oil sample sources</descgeog>
      <bounding>
        <westbc>46.8234</westbc>
        <eastbc>51.0750</eastbc>
        <northbc>40.5095</northbc>
        <southbc>40.1858</southbc>
      </bounding>
    </spdom>
    <keywords>
      <theme>
        <themekt>ISO 19115 Topic Category</themekt>
        <themekey>biota</themekey>
      </theme>
      <theme>
        <themekt>USGS Thesaurus</themekt>
        <themekey>Industrial pollution</themekey>
        <themekey>petroleum</themekey>
        <themekey>image analysis</themekey>
      </theme>
      <theme>
        <themekt>National Agricultural Library Thesaurus</themekt>
        <themekey>artificial intelligence</themekey>
      </theme>
      <theme>
        <themekt>None</themekt>
        <themekey>machine learning</themekey>
      </theme>
      <theme>
        <themekt>USGS Metadata Identifier</themekt>
        <themekey>USGS:689a01fdd4be02504d348c18</themekey>
      </theme>
      <place>
        <placekt>Geographic Names Information System</placekt>
        <placekey>Azerbaijan</placekey>
      </place>
      <place>
        <placekt>None</placekt>
        <placekey>Columbia Environmental Research Center</placekey>
      </place>
    </keywords>
    <accconst>None. Please see 'Distribution Info' for details.</accconst>
    <useconst>These data are marked with a Creative Common CC0 1.0 Universal License. These data are in the public domain and do not have any use constraints. It is requested that the authors be cited for any subsequent publications that reference this dataset. Users are advised to read the dataset's metadata thoroughly to understand appropriate use and data limitations.</useconst>
    <ptcontac>
      <cntinfo>
        <cntperp>
          <cntper>Jeffery A Steevens</cntper>
          <cntorg>USGS - SOUTHEAST REGION</cntorg>
        </cntperp>
        <cntpos>Center Director/Supervisory Toxicologist</cntpos>
        <cntaddr>
          <addrtype>mailing and physical</addrtype>
          <address>Columbia Environmental Res Ctr, CERC - R1 Research Building</address>
          <city>Columbia</city>
          <state>MO</state>
          <postal>65201</postal>
        </cntaddr>
        <cntvoice>573-876-1819</cntvoice>
        <cntemail>jsteevens@usgs.gov</cntemail>
      </cntinfo>
    </ptcontac>
    <datacred>Study was funded by the USGS Ecosystem Mission Area and the DOI Inland Oil Spill Preparedness Program</datacred>
    <native>Windows 11 version 23H2, build 22631.5624; notepad++ 64-bit x64 version 8.8.3</native>
    <crossref>
      <citeinfo>
        <origin>Biplab Poudel</origin>
        <pubdate>2025</pubdate>
        <title>Real-Time Oil Spill Concentration Assessment through Fluorescence Imaging and Deep Learning</title>
        <geoform>application/service</geoform>
        <onlink>https://github.com/biplabpoudel25/Oil-spill-estimation</onlink>
      </citeinfo>
    </crossref>
    <crossref>
      <citeinfo>
        <origin>D. P. Liu</origin>
        <origin>M. Liu</origin>
        <origin>G. Y. Sun</origin>
        <origin>Z. Q. Zhou</origin>
        <origin>D. L. Wang</origin>
        <origin>F. He</origin>
        <origin>J. X. Li</origin>
        <origin>J. C. Xie</origin>
        <origin>R. Gettler</origin>
        <origin>E. Brunson</origin>
        <origin>J. Steevens</origin>
        <origin>D. Xu</origin>
        <pubdate>2023</pubdate>
        <title>Assessing Environmental Oil Spill Based on Fluorescence Images of Water Samples and Deep Learning</title>
        <geoform>publication</geoform>
        <serinfo>
          <sername>JOURNAL OF ENVIRONMENTAL INFORMATICS</sername>
          <issue>42(1)</issue>
        </serinfo>
        <onlink>doi.org/10.3808/jei.202300491</onlink>
      </citeinfo>
    </crossref>
  </idinfo>
  <dataqual>
    <attracc>
      <attraccr>Images were collected at room temperature and under ambient light conditions. The image collection device ensures that no exterior light affects the image collection. Samples were held in a way to minimize loss of chemical including tight seals on container, limited exposure to UV light, and maintain at room temperature. The instrument was fully charged at the time of image collection to ensure adequate lighting and camera function.</attraccr>
    </attracc>
    <logic>Images are representative of oil sample fluorescence. These were visually inspected and then used for the model calibration.</logic>
    <complete>Data set is considered complete for the information presented, as described in the abstract. Users are advised to read the rest of the metadata record carefully for additional details.</complete>
    <posacc>
      <horizpa>
        <horizpar>No formal positional accuracy tests were conducted.</horizpar>
      </horizpa>
      <vertacc>
        <vertaccr>No formal positional accuracy tests were conducted.</vertaccr>
      </vertacc>
    </posacc>
    <lineage>
      <procstep>
        <procdesc>The dataset comprises a comprehensive collection of oil spill sample images.  We used two distinct oil types measured with a fluorescent device to calibrate a machine learning model to measure oil during a spill. The two oils were a napthalenic crude oil (NACO) and an aromatic-napthalenic crude oil (ANCO) that were obtained from ONTA (Baku, Azerbaijan). The NACO was obtained from the Gunashli oilfield offshore from Baku, Azerbaijan, in the Caspian Sea (Latitude 40.185799, Longitude 51.075001). The ANCO was obtained from a source in Naftalan City, Azerbaijan (Latitude 40.509455, Longitude 46.823391). Oil samples were prepared in hexane and diluted to concentrations ranging from 0 to 500 mg/L.  Samples were placed in a 4.5 ml polystyrene cuvette with a pathlength of 10 mm. The oil samples were measured using a custom-built, portable imaging device to measure fluorescence after being illuminated with a 425 nm UV LED for less than 1 second.  The image was captured using an OmniVision model OV2640 camera.  The images captured in the emission are typically blue, gray, or purple over approximately 450 to 700 nm. The images were processed using a machine learning model developed for this application.  The model was validated under two experimental settings: (1) cross-dataset evaluation, where training was done on NACO and testing on ANCO, and (2) combined-data evaluation, using both oil types for training and testing. In the cross-dataset evaluation, the 913 images were split for training (643 images) and testing (275 images). This model was further tested using the 612 ANCO images. In the combined-data evaluation, all images were combined and then 1224 images were used to train the model and 306 images were used for testing. Source code for this work is available at https://github.com/biplabpoudel25/Oil-spill-estimation or can be obtained from Dong Xu, xudong@missouri.edu.</procdesc>
        <procdate>20231107</procdate>
      </procstep>
    </lineage>
  </dataqual>
  <eainfo>
    <detailed>
      <enttyp>
        <enttypl>Oil_flouresc_fileNames.csv</enttypl>
        <enttypd>Comma Separated Value (CSV) file containing concentration data and file names for images from a fluorescence imaging device. Images were used to train a machine learning tool to estimate oil concentration.</enttypd>
        <enttypds>Producer defined</enttypds>
      </enttyp>
      <attr>
        <attrlabl>OilType</attrlabl>
        <attrdef>An acronym used to identify the oil imaged in the file.</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>NACO</edomv>
            <edomvd>napthalenic crude oil (NACO) obtained from the Gunashli oilfield offshore from Baku, Azerbaijan in the Caspian Sea (Latitude 40.185799, Longitude 51.075001)</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
        <attrdomv>
          <edom>
            <edomv>ANCO</edomv>
            <edomvd>aromatic-napthalenic crude oil (ANCO) ANCO was obtained from Naftalan City, Azerbaijan (Latitude 40.509455, Longitude 46.823391)</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Conc</attrlabl>
        <attrdef>The nominal concentration of oil prepared in hexane to collect fluorescent images for calibrating the machine learning model, measured as milligrams per liter.</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <rdom>
            <rdommin>0</rdommin>
            <rdommax>500</rdommax>
            <attrunit>milligrams per liter</attrunit>
          </rdom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>Units</attrlabl>
        <attrdef>The standard unit of measure for the 'Conc'</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <edom>
            <edomv>mg/L</edomv>
            <edomvd>milligrams per liter</edomvd>
            <edomvds>Producer defined</edomvds>
          </edom>
        </attrdomv>
      </attr>
      <attr>
        <attrlabl>FileName</attrlabl>
        <attrdef>The literal name for the Joint Photographic Experts Group (JPEG) file of the oil at the indicated concentration created by the fluorescence imaging device. The file number uses the following format ABCDEFGHIJKL where ABCD = year, EF = month, GH = day, IJKLMN = image number.</attrdef>
        <attrdefs>Producer defined</attrdefs>
        <attrdomv>
          <udom>The file number uses the following format ABCDEFGHIJKL where ABCD = year, EF = month, GH = day, IJKLMN = image number</udom>
        </attrdomv>
      </attr>
    </detailed>
    <detailed>
      <enttyp>
        <enttypl>ANCO.zip</enttypl>
        <enttypd>A zipped (.zip) folder containing images described in Oil_flouresc_fileNames.csv of the ANCO oil type.</enttypd>
        <enttypds>Producer defined</enttypds>
      </enttyp>
    </detailed>
    <detailed>
      <enttyp>
        <enttypl>NACO.zip</enttypl>
        <enttypd>A zipped (.zip) folder containing images described in Oil_flouresc_fileNames.csv of the NACO oil type.</enttypd>
        <enttypds>Producer defined</enttypds>
      </enttyp>
    </detailed>
    <overview>
      <eaover>This data release is comprised of two ZIP folders (ANCO.zip and NACO.zip) containing images in JPEG format. The images are of two types of oil at varying concentrations captured by a fluorescence imaging device. A Comma-separate value (CVS) file describes the concentration and oil type of each image.</eaover>
      <eadetcit>NA</eadetcit>
    </overview>
  </eainfo>
  <distinfo>
    <distrib>
      <cntinfo>
        <cntperp>
          <cntper>GS ScienceBase</cntper>
          <cntorg>U.S. Geological Survey</cntorg>
        </cntperp>
        <cntaddr>
          <addrtype>mailing and physical</addrtype>
          <address>Denver Federal Center, Building 810, Mail Stop 302</address>
          <city>Denver</city>
          <state>CO</state>
          <postal>80225</postal>
          <country>United States</country>
        </cntaddr>
        <cntvoice>1-888-275-8747</cntvoice>
        <cntemail>sciencebase@usgs.gov</cntemail>
      </cntinfo>
    </distrib>
    <distliab>Unless otherwise stated, all data, metadata and related materials are considered to satisfy the quality standards relative to the purpose for which the data were collected. Although these data and associated metadata have been reviewed for accuracy and completeness and approved for release by the U.S. Geological Survey (USGS), no warranty expressed or implied is made regarding the display or utility of the data on any other system or for general or scientific purposes, nor shall the act of distribution constitute any such warranty.</distliab>
  </distinfo>
  <metainfo>
    <metd>20250815</metd>
    <metc>
      <cntinfo>
        <cntperp>
          <cntper>CERC Data Manager</cntper>
          <cntorg>U.S. Geological Survey, Columbia Environmental Research Center</cntorg>
        </cntperp>
        <cntpos>Natural Resource Data Manager</cntpos>
        <cntaddr>
          <addrtype>mailing and physical</addrtype>
          <address>4200 New Haven Road</address>
          <city>Columbia</city>
          <state>MO</state>
          <postal>65201</postal>
          <country>United States</country>
        </cntaddr>
        <cntvoice>573-875-5399</cntvoice>
        <cntemail>gs-mw-cerc_data_manager@usgs.gov</cntemail>
      </cntinfo>
    </metc>
    <metstdn>FGDC Biological Data Profile of the Content Standard for Digital Geospatial Metadata</metstdn>
    <metstdv>FGDC-STD-001.1-1999</metstdv>
  </metainfo>
</metadata>
