Exploiting LCSH, LCC, and DDC to Retrieve Networked Resources

Comments by Diane Vizine-Goetz

Final version

Conference goal

The conference goal for the Library of Congress Subject Headings (LCSH), Library of Congress Classification (LCC) and the Dewey Decimal Classification (DDC) is to encourage wider use of these schemes for resource description and discovery. In considering new uses for our traditional subject access systems, it is useful to review how widely these schemes are currently used. Sometimes we forget the role they play in subject retrieval worldwide as we are overwhelmed by news of what the web is doing or not doing.

As reported by Magda Heiner-Freiling, we find that LCSH is heavily used in national libraries outside the United States [1]. Twenty-four national libraries use LCSH in their national bibliographies. This number does not include the many translations and adaptations of LCSH throughout the world and other familiar subject heading systems based on LCSH. When we turn our attention to LCC, we see that it has become increasing available and accessible. Any of us who have worked with large-scale bibliographic or classification data recognize what an enormous accomplishment it was to convert LCC to the MARC classification. Not only its conversion, but also the way it has been made useable in Classification Plus are accomplishments that the Library of Congress can be proud of. Turning to Dewey, we find that the DDC is sometimes thought of only as a scheme used by public and school libraries, but when we look outside the U.S. we see that DDC is the most widely used classification scheme. The DDC is used in more than 135 countries and has been translated into more than thirty languages [2].

Subject Access on the Web

Next, I would like to turn my attention to subject access on the web. Despite the demonstrated value of our authorized schemes, the application of these systems to web resources is minimal. Quoting Marcia Bates, Lois reminds us how controlled vocabularies provide the consistency, accuracy and control that enables the efficient discovery and retrieval of resources in libraries. Yet despite these benefits, library subject access systems are used only in a very small way on the web.

To investigate the application of two of these systems, I analyzed the application of classification numbers to electronic resources in the CORC database. I looked at DDC and LCC usage in that database eliminating the NetFirst records since they all have Dewey numbers assigned to them. I found approximately 98,000 uses of DDC and 85,000 uses of LC Classification numbers. Although this is a sizable number of records which represents a considerable amount of effort by librarians and other metadata specialists, the use of library categorizations for organizing web resources is essentially invisible on the web. These records and similar ones in library catalogs are considerably less accessible than web sites indexed by Internet search engines and directory services.

When you compare the CORC project to the European subject gateway projects, of which there are many, you find a similar commitment to identifying and describing web resources using standard subject schemes and metadata standards (e.g., Dublin Core). The subject schemes used, include the Ei thesaurus, Agrovoc thesaurus, MeSH, UDC, DDC, etc. It is important to note that, although these gateways are openly accessible on the web, no dominant subject approach has emerged.

Research and Development

Over the past 10 years our research, development and standards efforts have been largely focused on making these schemes easier for humans to use and apply. There are many accomplishments in this regard:

Through these efforts, we have made LCSH, LCC, and DDC easier to use, but as a whole, the web community has not embraced our schemes. The web community does not understand library subject systems, has little knowledge of them and what is known, is often based on misinformation. Library schemes are perceived to be outmoded, out-of-date, and only useful for print and older materials

To overcome these biases, we will need to reengineer and re-conceive our schemes for new uses, including

Lois discusses several adaptations and new uses for LCSH, including a faceted LCSH that may be better suited to the requirements of electronic resources. The DDC is also being used for non-traditional applications in the networked environment. An XML version of Dewey is being used in a pilot project to provide high-level browsing across several subject gateways [3]. To encourage such experimentation and exploration, multiple representations of these schemes will be needed, such as MARC, XML and RDF. An example of a Dewey record in XML is shown below:

<?xml version="1.0"?>
<!-- Copyright (C) 2001 OCLC Online Computer Library Center, Inc. -->
<!-- All rights reserved. -->
<rec>
<en><a><ddc>006.31</ddc></a></en>
<eh><a>*Machine learning</a></eh>
<nin><a>Including</a><b>genetic algorithms</b></nin>
<nse><a>For</a><b>machine learning in knowledge-based systems</b><c>,
see</c><d><ddc>006.331</ddc></d><t>.</t></nse>
<nfx><f>*</f><a></a><b>Use notation <ddc>T1--019</ddc> from Table 1 as modified at
<ddc>004.019</ddc></b><t>.</t></nfx> <ieh><a>Genetic algorithms</a><b>computer science</b><b>artificial intelligence</b></ieh>
<ieh><a>Machine learning</a></ieh>
<SM><f0>sh 94004662</f0><a>Computational learning theory</a></SM>
<FM><f0>sh 94004662</f0><a>Computational learning theory</a><b>--Congresses</b></FM>
<SM><f0>sh 91000149</f0><a>Computer algorithms</a></SM>
<FM><f0>sh 91000149</f0><a>Computer algorithms</a><b>--Congresses</b></FM>
<SM><f0>sh 92002377</f0><a>Genetic algorithms</a></SM>
<EM><f0>sh 96010308</f0><a>Genetic programming (Computer science)</a></EM>
<SM><f0>sh 96010308</f0><a>Genetic programming (Computer science)</a></SM>
<FM><f0>sh 96010308</f0><a>Genetic programming (Computer science)</a></FM>
<SM><f0>sh 85079324</f0><a>Machine learning</a></SM>
<FM><f0>sh 85079324</f0><a>Machine learning</a></FM>
<FM><f0>sh 85079324</f0><a>Machine learning</a><b>--Congresses</b></FM>
<SM><f0>sh 90001937</f0><a>Neural networks (Computer science)</a></SM>
<SM><f0>sh 92000704</f0><a>Reinforcement learning (Machine learning)</a></SM>
</rec>

I will conclude with an example that shows how the Dewey classification can be employed in another nontraditional way-to categorize search results. This example was inspired by a talk given by Susan Dumais, a researcher from Microsoft [4]. She and her colleagues evaluated two basic interfaces for structuring search results, a category interface and a list interface. The interfaces were developed to investigate the cognitive processes that lead to effective analysis of results. In the category interface, search results are organized into hierarchical categories and in the list interface, search results are presented as a ranked list. Automatic text categorization was used to categorize the web pages into a broad set of categories based on the categories used on the LookSmart site [5]. At the ASIS&T SIG/CR Classification Research Workshop, Dumais reported that users were not hampered by misclassified items or when results were presented in multiple categories. The sites that could not be categorized were presented in a NotCategorized group.

Through user studies, the researchers found that users preferred the category interface and performed 50% faster at finding relevant information. These results underscore the statements of other speakers at this meeting who called for a greater tolerance for inconsistency or dissonance in our own processes.

To explore whether a similar approach might work in the library environment, I searched in the CORC catalog for the term "cookies." As you can imagine, such a term has multiple meanings. I choose CORC because the resource catalog contains a mixture of traditionally cataloged materials and materials under looser bibliographic control, i.e., DDC numbers assigned using automatic classification [6]. For every record that had a Dewey number, I mapped the number up to its three digit Dewey number. What you see in the example below, is a portion of the search results presented using Dewey categories at the third level of hierarchy.

Data processing Computer science

  1. FTP Site of NeoSoft. The FTP site at ftp://ftp.neosoft.com contains the NeoSoft archives. This FTP site is run by NeoSoft, Houston, Texas, in the USA, in a time zone -6 hours from GMT. To access this site over the Web, use URL ftp://ftp.neosoft.com/. The FTP server runs on the UNIX operating system. It also goes by the name of uuneo.neosoft.com.
  2. Privacy.net. Features Privacy.net, which provides information about privacy and the Internet, compiled by Consumer.net, a consumer information organization. Discusses cookies, information gathering, encryption, and more.

    Computer programming, programs, data

  3. Misc.kids Frequently Asked Questions (FAQs): Allergies and Asthma and Recipes. Features recipes for people with allergies and asthma. Notes that the recipes are part of the FAQ section on allergies and asthma of the misc.kids newsgroup. Explains that the information in the FAQ is not intended to replace medical advice. Lists wheat and gluten free recipes for bread, muffins, pancakes, cakes, cookies, and desserts. Provides milk and egg free recipes for cakes, cookies, and desserts. Links to the FAQ section, allergy and asthma resources, and book reviews.
  4. Cookies
  5. Web Developer's Library (WDVL): Webmaster's Lexicon. Presents a glossary of terms useful to webmasters as part of the Web Developer's Virtual Library (WDVL). Allows users to select individual terms or letters of the alphabet to search for definitions. Defines ActiveX, background, cookies, database, FAQ, graphic, and many other terms. Links to a Web authoring guide, tutorials, a FAQ section, and other Web design-related sites.
  6. Programming in JavaScript, Volume two

    Food and drink

  7. CookieRecipe.com. Presents recipes for all types of cookies. Includes recipes for bar cookies, Christmas cookies, drop cookies, filled cookies, International cookies, molded cookies, no bake cookies, refrigerator cookies, rolled cookies, sugar free cookies, eggless cookies, and gluten-free cookies. Contains a site search engine. Offers conversion tables for common ingredients, as well as tips and hints. Allows the user to participate in a recipe exchange and submit requests for recipes. Provides a weekly listing of the ten most popular recipes. Links to BreadRecipe.com, PieRecipe.com, and CakeRecipe.com.
  8. Egg-stra Delicious Recipes Just for Easter. Presents a collection of Easter recipes. Includes recipes for candy, cakes, cookies, rolls, and cupcakes. Links to other recipe and Easter related Web sites.
  9. Cookies and Bars. Features an index of recipes for various cookies and bars. Lists the recipes in alphabetical order. Includes cookies and bars such as snickerdoodles, baklava, biscotti, brownies, gingerbread cookies, lemon bar cookies, and others.
  10. Misc.kids Frequently Asked Questions (FAQs): Allergies and Asthma and Recipes. Features recipes for people with allergies and asthma. Notes that the recipes are part of the FAQ section on allergies and asthma of the misc.kids newsgroup. Explains that the information in the FAQ is not intended to replace medical advice. Lists wheat and gluten free recipes for bread, muffins, pancakes, cakes, cookies, and desserts. Provides milk and egg free recipes for cakes, cookies, and desserts. Links to the FAQ section, allergy and asthma resources, and book reviews.
  11. M&M's Chocolate Mini Baking Bits. Presents information on M&M's Chocolate Mini Baking Bits from Mars, Inc. Includes recipes for baking with the bits and several hints for successful baking on topics such as choosing butter or margarine, measuring ingredients, preheating the oven, selecting baking sheets, preparing baking sheets, sizing and shaping cookies, storing baked goods, and freezing baked goods. Provides access to a tour of the manufacturing process of the bits. Contains a FAQ section.

If you were to translate the labels into Dewey numbers, the first one is 004, the second is 005 and third one on the page is 641. The same set of results (54 records) that would normally have appeared in a ranked list is shown here broken down by Dewey categories. In this example, resources about Internet cookies appear in the first two categories and resources about the cookies that we like to eat are in the food and drink category. One resource made it into both types of categories. That was one of those dissonant records. In spite of that, the results are promising and suggest that new applications of traditional schemes are possible and that additional experimentation is needed.

References

  1. Heiner-Freiling, Magda (2000). Survey on Subject Heading Languages Used in National Libraries and Bibliographies. Cataloging and Classification Quarterly. 29 (1 / 2): 189-198.
  2. About Dewey and OCLC Forest Press. Available at http://www.oclc.org/dewey/about/index.htm
  3. Renardus. Available at http://www.renardus.org/index.html.
  4. S. T. Dumais, E. Cutrell and H. Chen. Classified displays of web search results. Invited presentation at ASIS&T SIG/CR Classification Research Workshop, Nov 12, 2000. Available at http://uma.info-science.uiowa.edu/sigcr/papers/sigcr00dumais.doc
  5. LookSmart. Available at http://www.looksmart.com/
  6. OCLC CORC / About CORC. Available http://www.oclc.org/oclc/corc/about/corc_over.htm

Library of Congress
January 31, 2001
Library of Congress Help Desk