By Science Topic
By Data Source
Alaska Science Center
Coastal and Marine Geology Program
Core Science Analytics and Synthesis
Data Preservation, Informatics, and Laboratories
Earth Resources Observation and Science
Earthquake Hazards Program
Forest and Rangeland Ecosystem Science Center
Geo Data Portal Catalog
Great Lakes Science Center
Greater Everglades Ecosystem Restoration (SOFIA)
National Geospatial Program
National Research Program
National Wildlife Health Center
Patuxent Wildlife Research Center
Southeast Ecological Science Center
Upper Midwest Environmental Sciences Center Data
USA National Phenology Network (NPN)
USGS Energy Data Finder
USGS Geoscience Data Catalog
USGS Libraries Program
USGS National Geologic Map Database
Water National Spatial Data Infrastructure Node
The USGS Science Data Catalog:
- Meets White House Open Data reporting requirements for USGS
- Provides a Search < Discovery Tool that allows for metadata retrieval, visualization, download, < linking back to original data providers
- Offers a single source for USGS to serve its metadata to data.doi.gov, Data.gov, OMB, etc.
- Helps ensure that USGS metadata meet minimum requirements
- Supports data managers in applying the Publish/Share element of the USGS Science Data Lifecycle Model
Frequently Asked Questions (FAQ) about the Science Data Catalog
On May 9, 2013, the White House released the Executive Order, "Making Open and Machine Readable the New Default for Government." This Order, built upon an earlier interagency memorandum released by the Office of Science and Technology Policy (OSTP), was accompanied by an Office of Management and Budget (OMB) Policy and a site hosted on GitHub called Project Open Data, to guide implementation.
To meet the requirements outlined in these initiatives, the U.S. Geological Survey (USGS) has undertaken a variety of tasks in concert with the Department of the Interior (DOI) to display USGS open data assets in a searchable public data listing. This listing serves as an official inventory for meeting Office of Management and Budget requirements, and the SDC serves as a single point of entry for USGS records to display in Department of the Interior and data.gov indexes, among others.
The Science Data Catalog describes USGS-produced, authoritative data that have undergone the data release process outlined in USGS Fundamental Science Practices. The Catalog includes metadata describing individual datasets, data collections, and observational or remotely-sensed data contained in national systems (rather than records about individual observations). All items in the Science Data Catalog must contain an actionable link to the data or a data service. Links must lead to downloadable datasets, such as a zip file, an individual spreadsheet, a topographic map, or a USGS Data Series Report. Additionally, links may lead to an online data system containing hundreds or even thousands of individual datasets, where users are prompted to execute a more specific search. Finally, the metadata may include links to access the data via APIs and services, including map services.
Prospective data providers (USGS Science Centers and other data-producing/managing organizations) should contact email@example.com for assistance in providing their records to the SDC.
Metadata must be in Extensible Markup Language (XML) format and follow the Federal Geographic Data Committee's (FGDC) endorsed Content Standard for Digital Geospatial Metadata (CSDGM). In the future, the Science Data Catalog will accept metadata adhering to formats prescribed by the International Organization for Standardization (ISO) suite (e.g., 19115-1, 19115-2, 19119, 19111, etc.) Visit the USGS Data Management Web site for more information about metadata creation.
Yes, you can. To establish a regularly-scheduled harvest to run daily, weekly, or monthly, please e-mail firstname.lastname@example.org.
Updates or changes to metadata must be performed on the source XML metadata record that is harvested by the Science Data Catalog; these records are managed by the program or science center at the point of harvest (WAF, SiteMap or ScienceBase). If you have previously registered a WAF or SiteMap URL as a data source in the Dashboard and have subsequently added records to these sources, please contact email@example.com to request a reharvest to update records reported to the Science Data Catalog. A reharvest will replicate what is provided in the WAF or SiteMap, deleting and updating items, reflecting the records provided in the data source. If you have added additional records to the ScienceBase data source, these records will be automatically transferred to the Science Data Catalog, requiring no additional action from the data provider. Records deleted from ScienceBase will also be deleted from the Science Data Catalog.
myReports allows users to view the status of their metadata harvests. myReports provide useful information about the harvesting success of submitted records, showing invalid web links and other issues to help data providers improve data quality.
The total records harvested into the Science Data Catalog is displayed for each data contributor at the bottom left of each data contributor view. To view each individual record that have been harvested, click the numerical value next to "Total Records Harvested" or select the link to view Science Data Catalog Results.
The harvested results are displayed on the search page where users are able to view more information and other related resources about the records.
The total records with failed links is displayed for each data contributor at the bottom right of each data contributor view. To view each individual records with a message as to what failed please click the numerical value next to "Total Records with Failed Links" or the Harvest History link.
The Harvest History tab displays a harvest summary for a specific time stamp. If you want to view you failed links please click on the Failed Records tab.
For an example using the Upper Midwest Environmental Sciences Center Data contributor, the Failed Records tab displays a list of failed records along with the reasons they failed the validation process. Bad Links: If displays an associated link in the field, this means the labeled hyperlink is incorrect or not currently active. If the field only displays no onlink tag, this means there is no online_link attribute found in the metadata record.
If you wish to download the failed records as a comma separated value file you can click the download icon to prompt the file transfer. This file will contain the date processed, source file location and badlink field columns with the individual records as rows.
How to Search for USGS Data
The USGS Science Data Catalog enables the search and retrieval of USGS data sources. A multitude of search facets and filter options allows a user to refine searches to find the data they are looking for.
The Science Data Catalog search tab begins with a display of all datasets that are described in the Catalog.
Ordering, Sorting & Display of Results
By default, results are sorted by Relevance ranking but it can re-sort results on other parameters using the Sort By pulldown menu. A user can also change the number of results per page from the default of 10 to 50 or 100 per page.
Types of Data Available
All items in the Science Data Catalog contain a link to data or data service, accessible via the colored button(s) beneath the brief description of the dataset. Sometimes the link will take a user to a downloadable dataset, which may take the form of a zip file, an individual spreadsheet, a topographic map, or a USGS Data Series Report. Other links may land on an online data system containing hundreds or even thousands of individual datasets, where the user will be prompted to execute a more specific search. Finally, a number of data sources in the USGS provide access to data via APIs and services, including map services.
The challenge with a text search against metadata records created by hundreds of different data producers knowing which keywords they tagged to describe their data. For example, one ecologist may use the term "non-native plants" to describe the type of plant he is studying, while another might call it a "non-indigenous plant" or an "exotic plant."
In the near future, the USGS Science Data Catalog will address this problem by embedding thesauri behind the search interface to automatically search against known synonyms for common scientific concepts; for instance, if "non-native plants," was entered, the Catalog will return records with that phrase in the key metadata fields, but it will also return records using "non-indigenous plants" and "exotic plants" as well. At this time, however, we suggest running searches with synonyms known to you entered into the text box search, to ensure that any pertinent datasets are not missing.
Use the search box to type a specific term on which to search (e.g. bathymetry). The total number of results in the Catalog will be reduced to those containing the term 'bathymetry' in one of the key metadata fields given priority in the indexing process. The Catalog might also provide some Related terms that can be selected to re-run the query.
The total number of datasets in the Catalog has been reduced to those containing the term 'bathymetry' in one of the key metadata fields given priority in the indexing process.
There are a few choices for proceeding from this set of results. You can:
- Browse through the results presented in the list of relevant datasets.
- Use a Filter from the list in the left column to add another parameter that will further narrow the search (this will AND the term 'bathymetry' to the filter selected).
- Add an additional text query term to 'bathymetry' in the text search box (this will AND the added term to the existing term, 'bathymetry').
- Create a geographic polygon that will restrict the search for 'bathymetry' to datasets for a certain location.
- Use the "Related Terms" to explore related concepts to the term 'bathymetry.'
- Delete the 'bathymetry' query and start over with a brand new search.
Adding an Additional Query to Text Searches
In the previous example, a text search on bathymetry produced 695 results, too many items to easily scan. Perhaps you are interested specifically in bathymetric data for Monterey Bay. You can search within the original 'bathymetry' results set by typing an additional query into the text search box:
Add an additional query term to search (e.g. Monterey Bay) and the results will include records with both terms, 'bathemetry' AND 'Monterey Bay' appearing in one or more key metadata fields.
These 7 final results include records with both terms appearing in one or more key metadata fields.
To start a new search, clear out any previous search parameters displayed. Delete all query strings by clicking on the X to remove them. This will reset the search to the default state for a new query. If a previous search has been executed but not yet cleared, this will be displayed:
The easiest way to confirm that a new search is executing from the default state is to make sure that the words "All Catalog Holdings" displays beneath "Current Selection(s)" and that there are no query statements visible:
By default, compound queries are ANDed together to narrow the results set. In other words, the returned results must contain all query terms in the key metadata fields.
For example, if you are looking for datasets that address the degree of the nation's shoreline change, you might type the term, rate of shoreline change:
Or you could have worded it slightly differently to eliminate the 'of', as the shoreline rate change. Notice that the results set is the same:
The search simply ANDs all of the concepts together, ignoring the 'of' in the first example. All records that contained all three terms (shoreline AND rate AND change) in any of the key metadata fields were matched and returned. The matching records may or may not contain the exact phrases "rate [of] shoreline change" or "shoreline change rate"; they may instead contain the words 'rate' and 'change' in the title, and contain 'shoreline' somewhere in the Description field, for example. This means that some of the results in our example above may not deal specifically with rates of shoreline change, but could instead include a record on rates of change in seagull populations in shoreline versus urban areas.
If you are looking for a very specific concept, it often helps to use quotation marks around the phrase to search for the exact wording occurring in one or more key metadata fields.
In this case, a few items found contain the phrase "rate of shoreline change" which is the exact phrase that appears in Title and/or Description field for these records.
There are two ways to conduct a geospatial search on datasets in the Catalog:
- Geospatial keywords that describe the area of study (e.g. Alaska, Green River, Bakken Formation, Acadia National Park, Gulf of Mexico).
- Bounding coordinates that give the limits of coverage of a dataset: western-most, eastern-most, northern-most, and southern-most.
Geospatial keyword searches work best when searching for data from a very specific location. Research studies that are very localized are usually described by metadata that contain those location terms in the Title, Description, and Place Keyword fields. The key thing to remember is that the user is searching on words in the metadata. The Catalog currently does not translate text place keywords into coordinates, so keywords must match the textual metadata in order to return search results.
Search for geospatial keywords by entering them into the text search query and the results should appear at or near the top of relevance-ranked results.Bounding coordinates ("limit search by location")
Searching against the bounding coordinates in the metadata record is recommended when looking for datasets for a more general geographic area, and not a very specific named place.
Pan, zoom, and draw a bounding box on the map to specify area of interest, or to use dropdown menus to select pre-defined polygons for U.S. states, for countries, and for oceans and major water bodies of the world.Limit Search By Location Examples:
Example 1 Example 2
How to Contribute to the Science Data Catalog
The decision to contribute USGS metadata records to the Science Data Catalog begins at the science center or program level. Follow the steps below to help ensure that metadata meet all requirements and configurations for inclusion into the Science Data Catalog.
The metadata record(s):
- Must be in Extensible Markup Language (XML) file format to be machine readable (i.e. not a word document or pdf).
- Must follow the Federal Geographic Data Committee's (FGDC) endorsed Content Standard for Digital Geospatial Metadata (CSDGM).
The USGS Data Management Website provides non-prescriptive data management guidance, best practices, tools, and resources in one convenient location. Learn to create metadata resources to contribute to the USGS Science Data Catalog. To learn more about metadata standards and metadata creation, see the About page.
The Science Data Catalog harvests metadata records from online sources. Metadata intended for inclusion in the Catalog should be organized in a single online source for harvest. Centers and programs can provide their metadata collections to the Catalog from their own public servers, or leverage a remote data management platform to manage and serve their records.
The Science Data Catalog can harvest metadata record(s) from three types of sources:
- Web Accessible Folder (WAF): A URL address that is an online public folder containing the metadata record(s) for harvest. Use this option if all metadata records are stored in one online location.
- Site Map XML: A URL address of a Site Map XML file that lists the URLs of metadata records hosted from multiple online locations. Use this option if the metadata are stored in different online locations or in a metadata catalog.
- ScienceBase: The name of the folder containing the metadata records cataloged in ScienceBase, a collaborative data management platform, designed to help science teams organize, store and share information. Use this option if you want to store the metadata records in ScienceBase. Visit Using ScienceBase to Contribute Metadata for further instructions on how to begin.