Clearinghouse Concepts Q&A
This document describes the context of the National Geospatial Data Clearinghouse Network and details of its construction and operation.
- What is Clearinghouse?
- Why promote a Clearinghouse Activity?
- Why not just use Internet search engines?
- Who should participate in Clearinghouse?
- What are the requirements for being a Clearinghouse provider and user?
- What information is accessible through Clearinghouse?
- How does Clearinghouse work?
What is Clearinghouse?
The NSDI Clearinghouse Network, sponsored by the FGDC, is a distributed system of agency servers located on the Internet that contain field-level descriptions of available and planned digital spatial data, applications, and services. This descriptive information, known as metadata, is collected in a standard format to facilitate query and consistent presentation across multiple participating sites. Clearinghouse uses standards-based Web technology for the publication and discovery of available geospatial resources through the Geospatial Platform portal.
The fundamental goal of Clearinghouse is to provide access to digital spatial data and related online services for data access, visualization, or order. The Clearinghouse Network functions as a detailed catalog service with support for links to spatial data and browse graphics. Clearinghouse metadata are expected to include hyperlinks to online resources (e.g. map services, data download locations, data access services, applications) within their metadata entries to enable access to all facets of the described resource. Where digital data are too large to be made available through the Internet or the data products are made available for sale, linkage to an order form can be provided in lieu of a data set. Through this model, Clearinghouse metadata provides low-cost advertising for providers of spatial data, both non-commerical and commercial, to potential customers via the Internet.
Clearinghouse allows individual agencies, consortia, or geographically-defined communities to band together and promote their available digital spatial data through a federated metadata service. These servers may be installed at local, regional, or central offices, as dictated by the organizational and logistical efficiencies of each organization. All Clearinghouse servers are considered "peers" within the Clearinghouse activity -- there is no hierarchy among the servers -- permitting query by any user on the Internet with minimum transactional processing. When these Clearinghouse services are registered with the Platform portal, the system will harvest and cache a copy of the metadata for rapid retrieval, enabling search through a single interface to all registered assets in the U.S.
Why promote the Clearinghouse Activity?
The development of the Clearinghouse among U.S. Federal agencies was motivated by a desire to minimize duplication of effort in the collection of expensive digital spatial data and foster cooperative digital data collection activities. By promoting the availability, quality, and requirements for digital data through a searchable on-line system a Clearinghouse facility would greatly assist in coordination of data collection and research activities. Clearinghouse also provides a primary data dissemination mechanism to traditional and non-traditional spatial data users.
Federal participation in the Clearinghouse is directed by Executive Order 12906 through its official creation of the National Spatial Data Infrastructure. The FGDC is co-chaired by senior officials in the Department of Interior and the Office of Management and Budget.
Why not just use Internet search engines?
Digital spatial data and metadata are stored in many forms and systems which make their discovery on the Internet difficult. Structured metadata is typically exchanged in XML format with significant meaning stored in 'fields' or XML elements rather than the HTML documents typically indexed in search engines. Use of current web indexing technology offers literal text search and matching for metadata which happen to be stored in HTML, but do not generally provide the indexing required for search of coordinates, dates and times, and other numeric values. In addition, some entire collections of metadata are being managed within dynamic databases whose content is not accessible to search engines. The Clearinghouse functionality as implemented in the geodata.gov portal goes beyond existing search engine technology to include spatial query and permit simple search of metadata based on location and full-text search. Field-level search is also available to refine searches based on topical classification, geography, time, and other key fields in ways not possible with off-the-shelf search engine technology.
The general trend toward connectivity of spatial data producers, vendors, and users on the Internet coupled with the provision of online data via web services indicate a long-term public commitment to not only on-line data discovery but direct data access by client processes across internal and public networks. Clearinghouse provides a standards-based solution to catalog interoperability on the Internet today.
Who should participate in Clearinghouse?
Although initially targeted at federal agencies, the NSDI Clearinghouse Network includes numerous federal, state, university, and tribal metadata collections. Hundreds of metadata servers are also in operation outside the United States supporting the same interoperability standards. In short, any group regardless of size may publish their metadata to the Clearinghouse and make it visible in geodata.gov. Similar publishing portals exist in other countries for the coordination and publication of geographic resources outside the U.S. The federated catalog behind the NSDI Clearinghouse Network is also registered with the Group on Earth Observation (GEO) and its Global Earth Observation System of Systems (GEOSS). Thus U.S. content is now also visible via the GEO Web Portal.
The role of the FGDC in Clearinghouse is to collect stakeholder requirements, design and deploy federated search, discovery, and access solutions for the U.S geospatial community. The Geospatial Platform, in concert with the data.gov initiative, provide community coordination of the Clearinghouse, catalog, and its contributions to visualization, analysis, and application development in the emerging Platform environment. It is not the intent of the FGDC to create a centralized data system but to facilitate access to agency-operated distributed stores of spatial metadata, data, and services on the Internet.
What are the requirements for being a Clearinghouse provider and user?
A prospective spatial data publisher must have a public-facing web server with online access to metadata, catalogs, and spatial data. It is recommended that metadata services be co-located on hosts with spatial data collections to encourage synchronization between the spatial data, services, and the metadata being served. A publisher can share metadata through either 1) a Z39.50 server, 2) an OGC Catalog Server (CSW), or 3) a Web Accessible Folder (WAF) -- a browse-enabled directory on a host organization's web server that holds the XML metadata for direct harvest by the portal. An online registry is operated by the FGDC to track the operating details of existing Clearinghouse metadata services. Prospective users of Clearinghouse must have access to a current Web browser with a broadband connection to the Internet. Search and visualization interfaces exist at geo.data.gov and GeoPlatform.gov to provide custom levels of search access.
What information is accessible through Clearinghouse?
A "digital geospatial data set" is the primary item being described with metadata in the Clearinghouse activity. The definition of a data set can be adjusted to meet a given agency's requirements but it generally corresponds to individual identifiable data products (e.g. file, layer, service) for which metadata are customarily collected. This may equate to a specific satellite image, a shapefile, or a national vector data set, as managed by a data producer or distributor. Collections of data sets (e.g. flight lines, satellite "paths", map or data series) may also have generalized metadata that could be inherited by individual data sets.
Other geospatial resources may be described in the FGDC or ISO metadata, including online services (Web Map Service, Web Feature Service), data download locations, interactive web applications, documents, and other web-accessible resources. The Geospatial Data Presentation Form field in the metadata record can store this information, though other context can be inferred from the style of the URL. Also, FGDC metadata allows for multiple online linkages to be maintained in a metadata record, so multiple facets of the geospatial resource may be described.
How does Clearinghouse work?
To provide search interoperability among different servers of geospatial metadata, the search and retrieve protocol known as ANSI Z39.50-1995 (ISO 23950) was initially selected by the FGDC Clearinghouse activity. Although in use by a few organizations today, it has been effectively replaced by the Open Geospatial Consortium (OGC) Catalog Services specification, more specifically the HTTP version known as Catalog Service for the Web (CSW). Multiple catalog services and metadata collections (WAF) are registered with the GeoPlatform.gov site. A periodic harvest of all metadata is performed, and all metadata are indexed for search, as if all the metadata and data resources were consolidated in one location, though they are actually distributed among the agencies. This federated model preserves the notion of 'data closest to source' allowing agencies full control of the content, metadata, and update frequency.