FGDC Metadata and Clearinghouse
"Metadata" is a term used within the computer science community to denote characteristics or quality of data. There are many definitions of what is classified as metadata, mostly dependent on the expected uses of the metadata. In an extreme view, metadata may mean absolutely everything ancillary to the "datum" or measurement (meaning there is little real data and lots of metadata). On the other extreme, metadata may be just a few identified properties that support a certain application.
In 1994, within the spatial data user community, consensus was sought for what metadata should be managed with digital spatial data sets for multiple uses. The anticipated primary uses of geospatial metadata defined within the preamble to the FGDC Content Standards for Digital Geospatial Metadata are:
- to help organize and maintain an organization's internal investment in spatial data,
- to provide information about an organization's data holdings to data catalogues, clearinghouses, and brokerages, and
- to provide information to process and interpret data received through a transfer from an external source.
An emphasis is placed on using metadata elements in a discovery and query environment to provide "fitness for use" information to prospective users of digital geospatial data. The FGDC metadata standard became official via Executive Order 12906 in June of 1994. This Order requires the inventory, documentation, and service of metadata for digital spatial data in the federal government.
The FGDC metadata standard is organized into approximately 300 data elements, 199 of which can be valued; the remainder are grouping elements (compound elements) which give an overall structure to the standard. Of these 199 elements, several dozen are considered "mandatory" and around 100 are considered "mandatory if applicable" -- a classification which requires interpretation by the user. The rest are optional to provide structured places for information that would otherwise become lost in generalized comment fields.
The Standard is broken into ten sections and each data element has a number to ease in human navigation. Only the mandatory elements within the Identification and Metadata Reference sections are absolutely required to prepare a bare minimum metadata entry. One quickly discovers, however, how ineffective this limited subset would be in the appraisal of data quality.
Optionality and RepeatibilityClose review of the FGDC Metadata Standard reveals a set of "production rules" which declare the syntax of content as far as:
If an element is declared optional and it is a compound element, then the elements "under" the compound are effectively optional. A traversal of the metadata standard will reveal a cascade of decision points, like branches in a tree which, when included, require the consideration of the remaining "leaves" on the branch. Omitting an optional or mandatory-of-applicable branch is permitted. Inclusion of random metadata elements is not permitted by the structure of the standard as the elements areout-of-context.
Elements are denoted as repeatible if they (and all their subelements) can be filled out multiple times.
Some elements are defined with a specific set of values that must be picked from (called a restricted domain), sometimes not allowing free-text or user-defined values to facilitate search.
The Content Standard only standardizes the logical structure and content of geospatial metadata. It does not provide a physical model, such as an implementation in a relational database, although it can be implemented within such a system. It does not specify a presentation or exchange format. An organization can technically claim to have FGDC-compliant metadata by storing metadata in fields that are semantically equivalent to the FGDC tags and organizational structure; yet they may be unable to produce metadata entries in a recognizable format.
This lack of rigor in the Metadata Standard is now being addressed through specific implementation guidelines within Clearinghouse which require presentation and exchange formats to achieve interoperability of search and presentation products.
Metadata format IssuesValued metadata elements in the FGDC metadata content standard take the form:
- Element_Name: Element Value
This form has become the text format for the mockup of FGDC metadata. Because there may be ambiguities in the creation of this format (capitalization, blank versus "_", indentation) it is not an ideal format for information exchange. Through experimentation it was proven that the encoding of the metadata using this "tag: value" syntax with indentation to enforce the beginning and ending of compound elements was an adequate "minimum" expression of metadata entries that could be used in Clearinghouse.
Because of the potential for ambiguities in this format it was noted at the first Metadata Implementors' Workshop in Reston in March 1995 that formats would be required for metadata exchange and presentation to make Clearinghouse -- and any viable metadata entry tools -- succeed.
Metadata formats for Clearinghouse
Two formatting requirements were defined early on in attempts to implement the Metadata Standard:
- a computer-readable format for data exchange and
- standard methods of presenting the metadata contents
Although there is a temptation to lump form and content into the same bin (e.g."What I see in my database is what I print"), the ability to differentiate the contents of the metadatabase (the columns or fields) from its presentation (writing formatted reports) is now commonplace in desktop database software packages. This allows users to consider more flexibly how to present what information.
For the exchange of FGDC metadata, the XML format is recommended, though sometimes its predecessor, SGML, may be used or found in older collections of the Clearinghouse Network. This provides a standard, structured representation of the metadata for use in software. It may also be rendered using an XML style sheet to make other formats, such as HTML.
For presentation, the HyperText Markup Language (HTML) versions and the indented text versions are popular for posting on websites and supporting human readability.Go back to image map