Research Participants Meetings Home Contact Us NDIIPP
NGDA Developments
 
NGDA
Interface (Version 1.0)
NGDA Interface Information
NGDA
Format Registry



        




            
 
FACIT

Federated Archive Cyberinfrastructure Testbed


A robust and flexible approach to archive interoperability has long been pursued by the digital preservation community because of its essential connection to two key goals. First, in order for different archives to federate their resources and collections, they must interoperate at some level. The more smoothly and completely they can interoperate the more productive and mutually beneficial their federation can be. Second, interoperability is also necessary to enable the multilayered series of transitions, or “handoffs,” that long term archives are bound to experience as, over time, their content is moved to new storage media, new management systems, new policy regimes, and so on. In this project, we propose to demonstrate a F ederated Archive Cyberinfrastructure Testbed ( FACIT ) that builds on a new architecture for digital preservation designed to significantly enhance the ability of archives to interoperate. Using FACIT, we expect to be able to show how this architectural approach can facilitate federated resource sharing, redundancy and improved access. FACIT will also demonstrate the retention of data semantics, persistent association of data with those semantics, migration of files to new formats, and archive-to-archive content migration.

A leading feature of the FACIT architecture is its strong adherence to the principle of layering for modularity. This is especially apparent in the clear separation it makes between generic data stewardship at the bit level , which involves the management of physical resources for back-up, replication, and other aspects of data logistics, and content stewardship at the object level , which focuses on the structure and semantics of archive content as logical objects that can exist in replica.


This layered approach gives us confidence that, while our demonstration will concentrate on geospatial data as content, the same architecture will apply equally well to any of the other content types and storage technologies being considered by the Library of Congress.


At the archive object level, FACIT will comprise a minimum of two, independent archives (denoted A and B in the attached diagram) holding separate content, and a format registry (denoted F) holding format specifications, metadata, and other contextual information that supports long-term reuse and reconstruction of the content in archives A and B.  The format registry will itself be implemented as an archive, and in particular, format specifications will themselves be represented and stored as archival objects.  The format registry archive will be built on top of the same FACIT storage infrastructure (described below) and, owing to its importance in the preservation system, will be mirrored at both archives A and B. Archival objects in archives A and B will reference formats in the registry.

For this demonstration, each FACIT archive will represent content using a data model that is compliant with the NGDA data model rules (the NGDA data model is a physical implementation of the OAIS logical model), though the detailed implementations of the respective archives are explicitly allowed to vary. Each archive will also support an export service that exports archival objects in a standard serialization to be determined.  In addition, each archive will support a file-system view of its content that can be crawled by a web-crawler.  As an example of services being built on top of archives, the content in archives A and B will be exported and then automatically ingested into and indexed by the Alexandria Digital Library (a set of services on top of the NGDA repositories), thereby providing spatiotemporal search over the entire archive federation.

At the data stewardship level, i.e. at the level of the “bits-are-bits” storage infrastructure, FACIT will build on logistical networking technology, which uses the Internet Backplane Protocol ( IBP ) to provide a highly generic interface for managing network storage resources, called depots . IBP is the common service in FACIT for accessing storage resources, and as such is a fundamental point of interoperability. Each FACIT archive will use L-Store , a virtual file system built on IBP and other logistical networking technologies, to manage data storage in both its private infrastructure and in the shared storage pool that the federation makes available. Using L-Store, and leveraging the low-level interoperability provided by IBP, FACIT archives will (from the archives' perspective) automatically and transparently mirror each other's content to provide fault-tolerance, increased accessibility, etc.  In addition, FACIT archives will participate in the larger REDDnet storage network, which expects to have more than 500TB storage distributed nationally by the end of 2008. Since REDDnet is based on IBP and LStore, FACIT archives will have seamless access to this larger, shared pool of storage.

We expect that, when fully implemented, FACIT will be able to demonstrate a number of features we believe to be desirable in preservation architecture.  At the storage level, it will demonstrate the use of protocol-based, logistical networking storage as an archive substrate.  This storage technology will make it possible, for example, for a third archive to enter the federation and to easily participate in mirroring arrangements with the existing members.  Or, alternatively, the storage technology will make it possible for an archive to move or replicate its content, (or parts thereof), for the temporary purpose of averting an oncoming natural disaster, or for the permanent purpose of handing off responsibility to a new archive or organization.  We explicitly note that a third archive in this testbed (Archive C in the diagram) may be hosted at the Library of Congress.  In addition, the use of mirrored storage across storage depots provides higher bandwidth access to archived content, a significant boost given that archives typically use storage systems designed for reliability and redundancy, not throughput.  And, finally, the embedding of archive storage in a larger network of storage depots provides seamless access to temporary storage, thus supporting temporary processing such as file transformations, format migrations, geospatial reprojections, and the like.

The testbed will also demonstrate a whole-system approach to the handling of format and other contextual information.  In the proposed architecture the format registry is a separate entity (reflecting the fact that registries are likely to be centralized community resources).  At the same time, the registry is itself an archive (reflecting that format information needs to be preserved along with primary data) that is networked with other archives.  In particular, the aforementioned archive export service will make it possible to automatically traverse the dependency graph for an archival object, thereby gathering all information necessary for the object's reconstruction.

Finally, as time and resources permit, the testbed will demonstrate the use of a common archive data model serialization format and an object export service.  The combination of these two features would support total archive handoff and reconstruction. We anticipate a demonstration prototype by summer 2008.


                                                                                                                                                       
 
 
      
Copyright © 2005-2009
University of California,
Santa Barbara, CA 93106
(805) 893-8000
Last Modified
July 31, 2007

UC Global Warming Initiative

 


 

Home Research Participants Meetings Contact Us NDIIPP