This is a DRAFT

Identifiers for digital resources

Libraries traditionally use unique identifiers for physical items in their collections -- by assigning call numbers and sticking labels on covers. A reader who identifies a book in a catalog can retrieve it by going to the shelf and looking for the label. The call number is the "key" that links the catalog record to the item it identifies. When the item is moved, its identifier goes with it. If library shelves are re-organized, individual call numbers need not be changed, only the signs on the shelves. In conjunction with a map of the stacks, these signs provide vital access support for the reader (or, in the case of closed stacks, for the deck attendant). The map helps the reader "resolve" the call number into a physical location.

Digital resources must also be identified uniquely. Until recently, no attempt was made to provide standard names for digital resources in general, except for very limited applications or in closed systems, such as within a single database. However, a digital library built for the long-term cannot be a closed system. It must be built out of modular components that can be supplemented and upgraded as new technology is developed. As in the traditional collection, the name for an item in the digital library will be the "key" that links catalogs, compilations, and references to the item itself. Figure 1 represents the modular design for supporting access to NDL collections: the user interface with which the patron interacts, the tools for access (catalogs, free-text indexes, finding aids, etc.) that support that interface, and the archive that contains the digital collections. Any item in the digital archive is accessible using several tools. The item must have a unique name that all references to it can use.

Figure 1. Primary access paths to the NDL collections

Diagrams are almost essential to this document.

The characteristics of digital resources pose challenges for naming. A file has no "cover" on which a label can be permanently fixed, and available to any user even if the file is copied to another computer. The only "cover" for a generic file is its filename. For digital resources that comprise a single file and have a fixed location, filenames do provide a basis for naming. Every file attached to a particular computer, whatever its operating system, must have a name, and the full name (which includes the "path" of nested directories in which the file is stored) must be unique within that computer's file system. Since every computer on the Internet (or any other computer network) must have an identifier unique across the network, the combination of computer identifier and filename provides a unique identifier for any file on the Internet.

The World Wide Web uses Uniform Resource Locators (URLs) as names

Uniform Resource Locators (URLs), the identifiers used on the World Wide Web (WWW) today, generalize the two-part identifier (computer name, file name) by adding a third component specifying the network protocol which should be used to access the file. The addition of the protocol component to the identifier allows names to be given to resources that are not files, such as interactive terminal sessions or database query forms. Some links below point to more details about URLs. The URL approach has proved powerful and flexible for identifying Internet resources and is one of three building-blocks on which the World Wide Web is based. The World Wide Web has been a phenomenal success because each of those building blocks was simple to understand, implement, and use in the Internet environment of the early 1990s. The American Memory project took advantage of the WWW environment to provide access to its historical collections across the Internet. Each item in the collections is accessed through its URL.

Figure 2. Access to the NDL collections in early 1996 -- using URLs

Diagrams are almost essential to this document.

However, there is a problem with URLs as long-term identifiers for digital items and resources. The URL incorporates the names of the computer and files that hold the resource. When a file or resource is moved (perhaps because a computer has failed or no longer has the capacity to handle user demand), the URL is no longer valid. Regular users of the WWW routinely come across links that lead nowhere. In some cases, old links lead to documents that apologize for the inconvenience and provide a link to the new URL, but that is hardly a reliable approach for the long term.

Uniform Resource NAMES (URNs) will be valid for the long term

The WWW community recognizes the shortcoming of URLs and has developed the concept of a Uniform Resource Name (URN). A URN is valid for the long term and independent of location, while still being globally unique. Several promising schemes for implementing a system of URNs have emerged. They address the form for names, methods to guarantee global uniqueness, and the design and deployment of a distributed system that provides an efficient address lookup function to "resolve" URNs into pointers to actual locations, with capabilities for publishers/authors/librarians to manage "their" names.

At the December 1995 meeting of the Internet Engineering Task Force (IETF), the most active groups with proposals agreed to go ahead and deploy a variety of systems in a way that allows them to work together and be tested through use by the Internet community. For summaries of the state of URN standardization, see Naming conventions for Digital Resources (by Rebecca Guenther of LC's Network Development and MARC Standards Office. January 3, 1996) and Uniform Resource Names: a progress report (by URN implementors, February 1996). There are also links below to the proposals for URN schemes and related materials.

The URN proposals have some commonalities:

Figure 3. Access to the NDL collections using URNs

Diagrams are almost essential to this document.

The implication is that by running its own name authority system, LC can use local naming conventions that support the production, indexing, and management of locally produced digital collections and are compatible with cataloging norms.

LC is working during 1996 with CNRI (Corporation for National Research Initiatives) on a prototype digital archive, based on the Handle System (CNRI's URN scheme) and a Repository that supports management and access control for items in the digital collections. This design can coexist with the current approach, as shown in Figure 4.

Figure 4. Access to the NDL collections using the digital archive prototype

Diagrams are almost essential to this document.

Delve deeper for further discussion of:


Related reading from outside the Library of Congress:

URLs -- the identifiers currently used on the World Wide Web

Requirements for URNs

Proposals and implementations for URNs

In February 1996, a group of implementors of schemes for Uniform Resource Names issued a joint progress report in D-lib magazine at http://www.dlib.org/dlib/february96/02arms.html. Links to information on individual URN proposals follow:

* CNRI's Handle System
CNRI's June 1995 proposal emphasizes persistent names and their management, with the expectation that names will usually be maintained by the publisher or author of the digital resources. The handle server is in use for the CORDS project for electronic copyright deposit at LC and for the NCSTRL distributed library of technical reports in computer science. It is also a component in the digital archive prototype under development for NDLP. In each of these projects, it will be used in conjunction with a repository structure that supports management and access control for an archive of digital resources.

* OCLC's proposal for PURLs (Permanent URLs)
OCLC implemented a naming scheme in late 1995 in conjunction with the InterCat project for "Cataloging the Internet," a project that supports the description of and access to Internet resources. InterCat PURLs use the OCLC InterCat record number as part of the identifier. The scheme implemented is an interim approach to a system of URNs, part of a broader vision described in an Internet draft earlier in the year.

* The Path URN proposal from NCSA (National Center for Supercomputing Applications)
NCSA is the organization that developed Mosaic, the first graphical browser for the World Wide Web). The Path URN proposal is based on extensions to the existing Domain Name Service (DNS). [DNS is a basic Internet service that handles "resolution" of domain names (such as lcweb.loc.gov) into the corresponding numeric "IP" addresses (in this case, 140.147.248.7)].

* The x-dns-2 URN Scheme


This page is linked to from many pages in the NDLP Documentation. To return to a page you were reading earlier, use the feature of your web-browser that keeps track of recent pages viewed (look for Go, History, or Navigate - WebMap). Or jump to the introduction or index page.

Intro -- Index -- Glossary -- Feedback

Identifiers -- This is a DRAFT
(2/19/96)