Note: The NDL Digital Archive Naming Group has used the term "Document Locator System" to refer to the combined function of the two upper layers of this diagram.
Each item in the digital archive must have a name, a unique identifier. A user (or a computer program on behalf of a user) will find this identifier in any one of a number of ways, such as
Given the name for an item, the address lookup function will provide an address, a physical location for the digital collection from which the item can be retrieved. When an item is moved from one computer to another, a single change in the address lookup system will ensure that all references to the name will still point to the item.
The archive should supply the item to the user in an appropriate format subject to rights of access relating to copyright, privacy, or terms of gifts or licenses.
The archive must be able to deal with items (or objects) in a variety of digital formats, those used widely today (such as JPEG and GIF for images, and AVI for movie segments), those in the R&D phase (such as wavelet technology which supports regeneration of images at various levels of resolution from the same stored information), and those not invented yet. The archive should allow for objects to contain other objects and for objects to be recognized as different manifestations of the same intellectual work. In the long term, the archive must be able to handle items managed by different groups and stored on different types of computer.
The digital archive must also support the maintenance and preservation of the digital collection.
Digital archives will be very large compared with traditional databases. Whatever the format and however sophisticated the compression schemes adopted, the storage required for digital objects is much greater than the storage required for the corresponding bibliographic records. It took over 20 years from the time that LC started putting bibliographic records into the MUMS catalog in 1968 for the disk capacity used on the IBM mainframe system to reach 142 gigabytes in 1992. On the UNIX-based servers that have been used since 1993 to store digital content (including the digitized historical collections, the MARVEL gopher service, and the THOMAS legislative information service), disk use had already reached 155 gigabytes by the end of 1995. Estimates for NDLP storage requirements by the end of 1996 are for 1 terabyte (1,000 gigabytes) without including any digitized maps. If plans for the Geography & Map Division to start digitizing maps are included, the estimate rises to 8 terabytes!
As with the other collections at the Library of Congress, some items will be used very seldom and a delay in retrieval would be an acceptable trade-off for savings in storage cost. In general, the cost of storage (per megabyte) goes up with average retrieval time for a file. Some NDLP files could be stored on a cheaper, slower medium (perhaps optical disk instead of magnetic disk). For example, storage is needed for the uncompressed high-resolution images that are not accessible through the primary WWW interface to the historical collections today because of limitations in network capacity (bandwidth) and workstations. The uncompressed images might be made available for special purposes: to cooperating institutions as part of joint projects; or to generate high-quality prints through an automated process. For such uses, a slightly longer retrieval time would be acceptable.
One building-block for the archive is a mechanism for automatic allocation and re-allocation of files across different types of storage. The system would detect which items are used most often and should be most easily accessible, and which items are seldom used and can be stored on a slower, but less expensive type of storage. This storage management system should operate in such a way that when a file is moved, no information within the collection management or address lookup systems needs to be changed.
Peter Graham, Associate University Librarian for Technical and Networked Information Services for Rutgers University Libraries, discusses two tasks associated with developing a Digital Research Library. The first is establishing the repository of digital scholarly materials; the second is providing tools for access.
Intro -- Index -- Glossary -- Feedback
Digital Archive Structure --
(1/25/96)