Clearinghouse Working Group Status, 1997
1. Participation and Coordination
Clearinghouse WG teleconferences held
The agenda topics and notes of all available teleconferences are available online under the Participation topic at the Clearinghouse Home Page at: http://www.fgdc.gov/clearinghouse/participation/participation.html.
Clearinghouse Working Group Email Membership
At the close of 1997, there were over 300 registered email participants in the Clearinghouse Working Group list. This list, available as firstname.lastname@example.org, provides two-way communication about Clearinghouse topics, and is the primary means for technical dissemination of Clearinghouse information.
Clearinghouse Site Registry with over 50 member sites
All participating organizations that have a compatible FGDC metadata service using the Z39.50 protocol are encouraged to register with the FGDC. At the end of 1997 there were over 50 sites, listed below, as registered -- and thereby jointly searchable through the supported Clearinghouse interfaces. Existing Clearinghouses now represent a significant coverage of the US (by state) and an increasing number of federal participants.
Tutorial materials renovated
The online tutorial materials for setting up a Clearinghouse Node were updated and used in several training sessions in 1997. The tutorial materials include handouts for class attendees, example datasets, and for the Windows platform, even includes the set of Isite software required in a training class. These can be linked to from the Clearinghouse Home Page or from the following link: ( http://www.fgdc.gov/clearinghouse/tutorials/howto.html)
Harmonization with GILS elements
Along the lines of working with other communities, it has been brought to our attention that the FGDC metadata elements as encoded in the Z39.50 Profile are not fully harmonized with the Government (or Global) Information Locator Service used internationally or are inconsistent with other internal documentation. Once again, to ensure that a search of an FGDC server by a GILS or FGDC client using the same search fields (e.g. title or abstract) is successful, the fields in the server need to be correctly mapped.
To correct these discrepancies, all registered Clearinghouse servers will be receiving instructions on how and when to update your fgdc.localmap files to support the correct numeric field tags. This will occur with the Final release of Isite (2.01) for all platforms.
Interoperability with Earth Observation Programs
In February NASA and the European Centre for Earth Observation (CEO) held a three day workshop entitled EO/GEO to openly discuss projects and activities relating to the earth observation community (satellite folks) and the geospatial data community (GIS folks). The presentations made it clear that we are facing similar data discovery problems and have needs for standardization. The workshop concluded with the following observations (and hopefully actions):
- to pursue mappings of schemas and semantics across different disciplines to broaden community benefit
- data format conventions and translators for spatial data similarly
- need to be made widely available to the developer community
- search across multidisciplinary data collections requires use of one or few well-known thesauri or controlled vocabularies. The NASA Global Change Master Directory project has a good candidate offering for this
- applications are being built using the Java language for flexibility and portability. An EO/GEO repository should be developed to cut development costs by re-using code.
- demonstrate consistency in search across EO and GEO catalogs, specifically between the FGDC metadata and the Catalogue Interoperability Profile (CIP) being developed by the Committee on Earth Observing Satellites
This last element has been actively pursued by the CEO, NASA, and FGDC since the February workshop. Specifications for how EO clients can find GEO data and vice versa are being developed and will be incorporated into the FGDC, ISO, and EO metadata and services work that is ongoing. A joint meeting between FGDC and CEOS staff was held in late 1997 to establish an architecture to make searching of imagery and GIS catalogs transparent to the end users. Both activities (FGDC and CEOS) are committed to supporting the ISO TC 211 metadata activity as the ISO Standard 15046-15 is approved. This will likely mean the development of a single, harmonized model for image and vector geospatial data catalogs.
ESA AVHRR Metadata entries online
Clearinghouse access to the imagery and quick-look data from a CD entitled "AVHRR CD-Browser Ionia" published in 1992 by ESA ESRIN has was provided in 1997 through a demonstrator site in Reston. The CD includes 1,915 browse images (GIF) and a product identifier for ordering. This provides demonstrator service to show how other metadata -- particularly imagery metadata -- could be converted to an adequate FGDC representation for the purpose of search and then to use the Isite software with the FGDC support to index and serve it. The CEOS Inventory Exchange Format (1992) was parsed and converted to a "full" but very minimal FGDC metadata entry from it. It pulls about 15 fields from the IEF for each image and generates a path to the files for a retrieval URL to the browse images. The metadata and perl program used are also available.
This demonstrates the ability to derive nominally compliant FGDC metadata from other sources and, through use of the metadata parser (mp) be able to generate other forms (HTML, TEXT, DIF) potentially on-demand. The entire collection of almost 2000 references takes approximately 5 minutes on a Pentium/Linux host to create and index the metadata and write out the variant forms of HTML and TEXT (SGML was the input to mp). The metadata occupy 16 MB on disk and the index files occupy 5 MB. The GIF quick-look imagery, of course, takes up about 370 MB of space. For static collections of imagery like this, it shows that the generation and management of FGDC metadata is not so large a system load in terms of disk space or processing time. For larger collections of series metadata, however, or where an operational metadatabase exists, an interface between the Z39.50 server and the database may still be warranted.
OpenGIS and FGDC Collaboration in WWW Mapping
The OpenGIS Consortium (OGC) recently approved a public specification, called RFP-1, which facilitates low-level interoperability on the spatial characteristics of point, line, and area features in a vector GIS. It is the foundation on which remote arbitrary clients can access collections of digital spatial data held in proprietary forms, but through this access method, provide neutral data access. Four vendor teams have implemented the RFP-1 features and are familiar with it, having demonstrated basic access at the last OGC meeting in Cambridge, England.
BBN, under contract to DARPA has developed a remote spatial data viewing facility called OpenMap (see the URLs) that uses a Java client to access multiple data servers via CORBA. On the server side, the requests are translated into the neutral RFP-1 client terminology and the interface accesses the data, displaying very basic features back to the remote client. Through our involvement with the WWW-mapping Special Interest Group of the OpenGIS Consortium the FGDC has channelled funds to OGC and its members to demonstrate OpenMap as one of several distributed mapping technologies to be shown on the LAN at the GIS/LIS conference exhibit hall floor in late October. This utility allows for a single viewing client to connect to and draw data simultaneously from different distributed sources.
The thread that ties this to Clearinghouse is that the data resources will be described using FGDC metadata and will be stored in mini Clearinghouse services. This integration is done in order to demonstrate Clearinghouse techniques for data discovery, including embedded linkages to "real" data online that a user can start to interact with, even without a client-side GIS. The current use of Clearinghouse by the GIS elite needs to be expanded to include the potentially vast Internet public who are not GIS-literate but would like to mix data from different sites on their screens. Another objective is to familiarize the OGC participants with the Clearinghouse as a testbed for catalog services as they prepare to issue a catalog specification in early 1998.
In response to this, a team has been assembled within the OGC to develop the interfaces to the various software systems and set up the demonstration for the GIS/LIS conference. The FGDC booth hosted a computer with the OpenMap demo and some local data for the other booths to access. It is our collective intent that these demonstrations will continue and be activated on the broader Internet after the conference for "real-world" testing and evaluation and that future GIS conferences will provide new venues to convene the showcase LAN once again.
Isite 2.00 Final
By the end of 1997, the Isite software, Version 2.00.06 was being used at over 60 sites around the world. To search one or more of these servers, one can connect to the Alaska or Reston Clearinghouse Gateways at:
Final release of 2.00 is anticipated in early 1998 pending delivery of tested versions for all UNIX and NT platforms and full documentation of the software. The Isite software has been updated after extensive testing and review by the Naval Research Lab. The fields that it supports have also been harmonized with those used in the Government Information Locator Service (GILS) for greater interoperability in search, and the document type has been expanded to recognize and map the ANZLIC metadata standard elements so they are cross-searchable with FGDC elements. The Isite software is made available through the Clearinghouse tutorials' download pages.
Blue Angel Technologies MetaStar
Blue Angel Technologies (www.bluangel.com) has announced a commercial solution using the Fulcrum search engine that supports the collection, management, editing, and service of FGDC-compliant metadata on the Internet. Blue Angel's approach supports multiple data views for the formulation of metadata as FGDC, GILS, and DIF formats on presentation. Testing has not been conducted on the MetaStar offering to verify its interoperability with the existing FGDC implementations.
FGDC Metadata Tools List available
As the result of a recent polling of metadata collection and management tools, I have prepared a table of seventeen "current" software programs that work with FGDC metadata. the table points to more detailed synopses of the software programs individually.
These software programs will be reviewed by an ad-hoc team in 1998 to prepare a more thorough external review of the available software, along the lines of the previous Mitre review, to assist users in selecting tools and to advise tool developers on enhancements. The review table and producer-supplied info is available at: http://www.fgdc.gov/metadata/toollist/metatools797.html
3. Systems Enhancements and Investigations
Gateways to the Clearinghouse
During 1997, the Alaska Geographic Data Committee site became the second official entrypoint or "Gateway" into the constellation of servers known as Clearinghouse. Additional official gateway sites are needed to offer the HTML and Java forms from a site other than the FGDC primary server. This is done to distribute search loads and to provide redundancy in case of inaccessible or slow networks. These gateway sites are clones of the FGDC gateway, are not customized (much), and provide search access to all registered Clearinghouse sites. Gateway sites are expected to be installed regionally to provide more regional access to the distributed Clearinghouse. A gateway map is now available from the FGDC site to provide you with links to these gateways as they become available
Server Status checks
To manage information about the status of all the searchable Clearinghouse Nodes, a program has been written to visit all the registered servers and produce a current status list in HTML. This feature is available at the URL: http://registry.fgdc.gov/serverstatus/
Web Addresses associated with Clearinghouse Participants
There have been several recent requests for additional information on who is participating in the formal Clearinghouse. This information is useful for sites interested in establishing compliant, searchable Nodes and to identify sites "in your neighborhood" to which you can turn for advice. The list of searchable nodes has been available from the Clearinghouse interface as the "Status" link. I have also just prepared a program that will retrieve the identities of websites that are referenced by registered Clearinghouse Nodes. This way one can use the Clearinghouse to search for spatial data, and can traverse links of participating organizations to explore ancillary information they may portray in their website but not make searchable via dataset-level metadata. Thus the Clearinghouse provides the basic search capability, and the website list supports site browse and contact capability.
Please visit the query interface and new website listing from the Clearinghouse page of the FGDC at http://www.fgdc.gov/clearinghouse/clearinghouse.html
A snapshot Clearinghouse CD-ROM is being developed to house a selection of metadata from over 25 Clearinghouse metadata collections and to provide a lookalike interface to the metadata that can be run on a standalone Windows95 PC. The Java-based user interface has been recast into a Windows graphical user interface that interacts with an database index of metadata on the CD. Given slow network speeds, the speed of CD search is likely to approximate or exceed that experienced frequently on the Internet. The intent of this CD is to promote awareness and use of the Clearinghouse and to provide a self-contained demonstration of the technology that can be run off-network. The synoptic CD reflects contributions of over 30 agencies and nearly 30,000 distinct metadata entries, captured in the fall 1997 timeframe. It should be ready for public distribution in April 1998.
In an effort to evaluate the current state of theme keywords as they exist in FGDC metadata under various Clearinghouse sites, and appraise the potential use of more standard thesaurus or formalized keywords, a request has been sent to the various Clearinghouse node administrators to compile a listing of theme keywords extracted from their metadata documents.
There is still a fundamental tension between the use of keywords and full-text search. Looking in my logfiles of queries I see virtually nobody searching on theme keywords, instead relying on full-text search for words. I think this partly comes from the predominant behavior on the Net of general text queries, and little or no experience in search using databases where keywords existed as legacy search devices. Where keywords for theme seem to make the most sense is in very small metadata entries. In such entries, full-text content may be light and the producer and user must rely on keywords for any meaningful search. On the other hand, contents of the keywords are also searchable as full-text, too, and often occur one or more times elsewhere in the body of text.
This exercise is intended to assess the ability to categorize our spatial data holdings based on an assumed common (union) vocabulary. Finding a common set of terms is not easy, even within disciplines...