This layered approach gives us confidence that, while our demonstration will concentrate on geospatial data as content, the same architecture will apply equally well to any of the other content types and storage technologies being considered by the Library of Congress.
At the archive object level, FACIT will comprise a minimum of two, independent archives (denoted A and B in the attached diagram) holding separate content, and a format registry (denoted F) holding format specifications, metadata, and other contextual information that supports long-term reuse and reconstruction of the content in archives A and B. The format registry will itself be implemented as an archive, and in particular, format specifications will themselves be represented and stored as archival objects. The format registry archive will be built on top of the same FACIT storage infrastructure (described below) and, owing to its importance in the preservation system, will be mirrored at both archives A and B. Archival objects in archives A and B will reference formats in the registry.
For this demonstration, each FACIT archive will represent content using a data model that is compliant with the NGDA data model rules (the NGDA data model is a physical implementation of the OAIS logical model), though the detailed implementations of the respective archives are explicitly allowed to vary. Each archive will also support an export service that exports archival objects in a standard serialization to be determined. In addition, each archive will support a file-system view of its content that can be crawled by a web-crawler. As an example of services being built on top of archives, the content in archives A and B will be exported and then automatically ingested into and indexed by the Alexandria Digital Library (a set of services on top of the NGDA repositories), thereby providing spatiotemporal search over the entire archive federation.
At the data stewardship level, i.e. at the level of the “bits-are-bits” storage infrastructure, FACIT will build on logistical networking technology, which uses the Internet Backplane Protocol ( IBP ) to provide a highly generic interface for managing network storage resources, called depots . IBP is the common service in FACIT for accessing storage resources, and as such is a fundamental point of interoperability. Each FACIT archive will use L-Store , a virtual file system built on IBP and other logistical networking technologies, to manage data storage in both its private infrastructure and in the shared storage pool that the federation makes available. Using L-Store, and leveraging the low-level interoperability provided by IBP, FACIT archives will (from the archives' perspective) automatically and transparently mirror each other's content to provide fault-tolerance, increased accessibility, etc. In addition, FACIT archives will participate in the larger REDDnet storage network, which expects to have more than 500TB storage distributed nationally by the end of 2008. Since REDDnet is based on IBP and LStore, FACIT archives will have seamless access to this larger, shared pool of storage.
We expect that, when fully implemented, FACIT will be able to demonstrate a number of features we believe to be desirable in preservation architecture. At the storage level, it will demonstrate the use of protocol-based, logistical networking storage as an archive substrate. This storage technology will make it possible, for example, for a third archive to enter the federation and to easily participate in mirroring arrangements with the existing members. Or, alternatively, the storage technology will make it possible for an archive to move or replicate its content, (or parts thereof), for the temporary purpose of averting an oncoming natural disaster, or for the permanent purpose of handing off responsibility to a new archive or organization. We explicitly note that a third archive in this testbed (Archive C in the diagram) may be hosted at the Library of Congress. In addition, the use of mirrored storage across storage depots provides higher bandwidth access to archived content, a significant boost given that archives typically use storage systems designed for reliability and redundancy, not throughput. And, finally, the embedding of archive storage in a larger network of storage depots provides seamless access to temporary storage, thus supporting temporary processing such as file transformations, format migrations, geospatial reprojections, and the like.
The testbed will also demonstrate a whole-system approach to the handling of format and other contextual information. In the proposed architecture the format registry is a separate entity (reflecting the fact that registries are likely to be centralized community resources). At the same time, the registry is itself an archive (reflecting that format information needs to be preserved along with primary data) that is networked with other archives. In particular, the aforementioned archive export service will make it possible to automatically traverse the dependency graph for an archival object, thereby gathering all information necessary for the object's reconstruction.
Finally, as time and resources permit, the testbed will demonstrate the use of a common archive data model serialization format and an object export service. The combination of these two features would support total archive handoff and reconstruction. We anticipate a demonstration prototype by summer 2008.