Final version
Conference goal
The conference goal for the Library of Congress Subject Headings (LCSH), Library of Congress Classification (LCC) and the Dewey Decimal Classification (DDC) is to encourage wider use of these schemes for resource description and discovery. In considering new uses for our traditional subject access systems, it is useful to review how widely these schemes are currently used. Sometimes we forget the role they play in subject retrieval worldwide as we are overwhelmed by news of what the web is doing or not doing.
As reported by Magda Heiner-Freiling, we find that LCSH is heavily used in national libraries outside the United States [1]. Twenty-four national libraries use LCSH in their national bibliographies. This number does not include the many translations and adaptations of LCSH throughout the world and other familiar subject heading systems based on LCSH. When we turn our attention to LCC, we see that it has become increasing available and accessible. Any of us who have worked with large-scale bibliographic or classification data recognize what an enormous accomplishment it was to convert LCC to the MARC classification. Not only its conversion, but also the way it has been made useable in Classification Plus are accomplishments that the Library of Congress can be proud of. Turning to Dewey, we find that the DDC is sometimes thought of only as a scheme used by public and school libraries, but when we look outside the U.S. we see that DDC is the most widely used classification scheme. The DDC is used in more than 135 countries and has been translated into more than thirty languages [2].
Subject Access on the Web
Next, I would like to turn my attention to subject access on the web. Despite the demonstrated value of our authorized schemes, the application of these systems to web resources is minimal. Quoting Marcia Bates, Lois reminds us how controlled vocabularies provide the consistency, accuracy and control that enables the efficient discovery and retrieval of resources in libraries. Yet despite these benefits, library subject access systems are used only in a very small way on the web.
To investigate the application of two of these systems, I analyzed the application of classification numbers to electronic resources in the CORC database. I looked at DDC and LCC usage in that database eliminating the NetFirst records since they all have Dewey numbers assigned to them. I found approximately 98,000 uses of DDC and 85,000 uses of LC Classification numbers. Although this is a sizable number of records which represents a considerable amount of effort by librarians and other metadata specialists, the use of library categorizations for organizing web resources is essentially invisible on the web. These records and similar ones in library catalogs are considerably less accessible than web sites indexed by Internet search engines and directory services.
When you compare the CORC project to the European subject gateway projects, of which there are many, you find a similar commitment to identifying and describing web resources using standard subject schemes and metadata standards (e.g., Dublin Core). The subject schemes used, include the Ei thesaurus, Agrovoc thesaurus, MeSH, UDC, DDC, etc. It is important to note that, although these gateways are openly accessible on the web, no dominant subject approach has emerged.
Research and Development
Over the past 10 years our research, development and standards efforts have been largely focused on making these schemes easier for humans to use and apply. There are many accomplishments in this regard:
Through these efforts, we have made LCSH, LCC, and DDC easier to use, but as a whole, the web community has not embraced our schemes. The web community does not understand library subject systems, has little knowledge of them and what is known, is often based on misinformation. Library schemes are perceived to be outmoded, out-of-date, and only useful for print and older materials
To overcome these biases, we will need to reengineer and re-conceive our schemes for new uses, including
Lois discusses several adaptations and new uses for LCSH, including a faceted LCSH that may be better suited to the requirements of electronic resources. The DDC is also being used for non-traditional applications in the networked environment. An XML version of Dewey is being used in a pilot project to provide high-level browsing across several subject gateways [3]. To encourage such experimentation and exploration, multiple representations of these schemes will be needed, such as MARC, XML and RDF. An example of a Dewey record in XML is shown below:
<?xml version="1.0"?>
<!-- Copyright (C) 2001 OCLC Online Computer Library Center, Inc. -->
<!-- All rights reserved. -->
<rec>
<en><a><ddc>006.31</ddc></a></en>
<eh><a>*Machine learning</a></eh>
<nin><a>Including</a><b>genetic algorithms</b></nin>
<nse><a>For</a><b>machine learning in knowledge-based systems</b><c>,
see</c><d><ddc>006.331</ddc></d><t>.</t></nse>
<nfx><f>*</f><a></a><b>Use notation <ddc>T1--019</ddc> from Table 1 as modified at
<ddc>004.019</ddc></b><t>.</t></nfx>
<ieh><a>Genetic algorithms</a><b>computer science</b><b>artificial intelligence</b></ieh>
<ieh><a>Machine learning</a></ieh>
<SM><f0>sh 94004662</f0><a>Computational learning theory</a></SM>
<FM><f0>sh 94004662</f0><a>Computational learning theory</a><b>--Congresses</b></FM>
<SM><f0>sh 91000149</f0><a>Computer algorithms</a></SM>
<FM><f0>sh 91000149</f0><a>Computer algorithms</a><b>--Congresses</b></FM>
<SM><f0>sh 92002377</f0><a>Genetic algorithms</a></SM>
<EM><f0>sh 96010308</f0><a>Genetic programming (Computer science)</a></EM>
<SM><f0>sh 96010308</f0><a>Genetic programming (Computer science)</a></SM>
<FM><f0>sh 96010308</f0><a>Genetic programming (Computer science)</a></FM>
<SM><f0>sh 85079324</f0><a>Machine learning</a></SM>
<FM><f0>sh 85079324</f0><a>Machine learning</a></FM>
<FM><f0>sh 85079324</f0><a>Machine learning</a><b>--Congresses</b></FM>
<SM><f0>sh 90001937</f0><a>Neural networks (Computer science)</a></SM>
<SM><f0>sh 92000704</f0><a>Reinforcement learning (Machine learning)</a></SM>
</rec>
I will conclude with an example that shows how the Dewey classification can be employed in another nontraditional way-to categorize search results. This example was inspired by a talk given by Susan Dumais, a researcher from Microsoft [4]. She and her colleagues evaluated two basic interfaces for structuring search results, a category interface and a list interface. The interfaces were developed to investigate the cognitive processes that lead to effective analysis of results. In the category interface, search results are organized into hierarchical categories and in the list interface, search results are presented as a ranked list. Automatic text categorization was used to categorize the web pages into a broad set of categories based on the categories used on the LookSmart site [5]. At the ASIS&T SIG/CR Classification Research Workshop, Dumais reported that users were not hampered by misclassified items or when results were presented in multiple categories. The sites that could not be categorized were presented in a NotCategorized group.
Through user studies, the researchers found that users preferred the category interface and performed 50% faster at finding relevant information. These results underscore the statements of other speakers at this meeting who called for a greater tolerance for inconsistency or dissonance in our own processes.
To explore whether a similar approach might work in the library environment, I searched in the CORC catalog for the term "cookies." As you can imagine, such a term has multiple meanings. I choose CORC because the resource catalog contains a mixture of traditionally cataloged materials and materials under looser bibliographic control, i.e., DDC numbers assigned using automatic classification [6]. For every record that had a Dewey number, I mapped the number up to its three digit Dewey number. What you see in the example below, is a portion of the search results presented using Dewey categories at the third level of hierarchy.
Data processing Computer science
Computer programming, programs, data
Food and drink
If you were to translate the labels into Dewey numbers, the first one is 004, the second is 005 and third one on the page is 641. The same set of results (54 records) that would normally have appeared in a ranked list is shown here broken down by Dewey categories. In this example, resources about Internet cookies appear in the first two categories and resources about the cookies that we like to eat are in the food and drink category. One resource made it into both types of categories. That was one of those dissonant records. In spite of that, the results are promising and suggest that new applications of traditional schemes are possible and that additional experimentation is needed.
References
Library
of Congress
January 31, 2001
Library of Congress Help Desk