A Comparison of Web Resource Access Experiments:
Planning for the New Millennium

Jane Greenberg
Assistant Professor
School of Information and Library Science
University of North Carolina at Chapel Hill
CB #3360. 207 Manning Hall
Chapel Hill, NC 27599-3360


Final version

Introduction

The exponential growth of the Internet, particularly the Web, has created many new challenges for librarians in their role as collectors, organizers, and access providers of information resources. This is particularly evident in the area of bibliographic control, where librarians are conducting a wide variety of experiments in order to provide access to new information formats and resources that are often volatile. While these experiments have been operational for a few years and they have been evaluated at least on an informal level (see project homepages, URLs given below in the Method section of this paper), there is little evidence of their having been compared in any unified way. One reason for this limitation is that these experiments are fairly new and developers have been more concerned with implementation than formal comparisons. It's likely that the innate differences among these experiments has also hampered comparison activities. That is, while these experiments all aim to improve Web resource access by bibliographic means, they differ greatly in their design and features, which makes comparison a difficult task. Despite such observations, these experiments can be compared at least on a general level, and they need to be compared if efforts in this area are to improve in the new Millennium.

Another way to consider this issue is that if researchers and leaders in the bibliographic control community had sufficient knowledge of what characteristics contributed to successful experimentation, they could be continued and incorporated into future projects. Likewise, an agreed upon list-perhaps even an official list-of considerations for improvement could direct future research agendas, encourage the development of alternative and innovative techniques, and ultimately have a positive impact on the next generation of experiments that aim to provide access to Web resources. The multi-case study reported on in this paper addresses these issues by examining five leading Web-based bibliographic access experiments and comparing them in a unified manner.

Experimentation and Web Resource Access

Experimentation in the scientific world involves the use of methodical investigative techniques to examine a particular problem or a series of problems. The bibliographic control community has responded to the problem of resource access that stems from the Web's exponential growth with experiments, such as OCLC's CORC project; UKOLN's BIBLINK, ROADS, and DESIRE projects; the NORDIC project; and a series of other initiatives. These experiments aim to improve the finding, gathering, and evaluating functions outlined in Cutter's Rules for a Dictionary Catalog (1901), and provide a stimulus for bibliographic control activities in the context of the Web. Several other factors allow these projects to be defined as experiments:

Project-specific evaluation is an important activity and can be critical to the success of these experiments. What is equally important at this stage of Web-based bibliographic control experimentation is research like that presented in this paper, which compares these initiatives.

Objectives of the Study

The study reported on in this paper examined experiments that aim to improve access to Web resources via bibliographic control methods. The objective was to compare these experiments and to identify characteristics of success and considerations for improvement. The following two research questions guided the study:

  1. What characteristics are found in successful Web resource access experiments?
  2. How can these Web resource access experiments be improved in the future?

Method

The investigation was a muti-case study that compared five experiments developed to improve access to Web resources via bibliographic control processes. The multi-case study method was selected because it was the best way to observe relationships among these experiments. Experiments examined include:

  1. BIBLINK: Linking Publishers and National Bibliographic Services [http://hosted.ukoln.ac.uk/biblink/]
  2. DESIRE (Development of a European Service for Information on Research and Education) [http://www.desire.org/]
  3. Nordic Metadata [http://www.ilrt.bris.ac.uk/roads/]
  4. OCLC CORC (Cooperative Online Resource Catalog) [http://www.oclc.org/oclc/corc/]
  5. ROADS (Resource Organization and Discovery in Subject-based services) [http://www.ilrt.bris.ac.uk/roads/]

A table outlining project goals and status is found in Appendix A.

Evaluation Criteria

A framework comprised of five evaluation criteria served as a basis for the study and allowed the experiments to be compared in a unified manner. These are defined as follows:

  1. Organizational structure.
    The experiment's structural foundation defined by its goals, administration (project leaders, members, and partners), and funding.

  2. Reception.
    The experiment's acceptance by the professional information community (e.g., librarians and other information professionals) and the larger general public.

  3. Duration.
    The experiment's time expanse and indicators of progression (e.g., alpha and beta release, version number, or phase).

  4. Application of computing technology.
    The experiment's exploitation of computing technology.

  5. Use of human resources. The experiment's ability to harness and optimize human knowledge and skills.

The criteria framework allowed for the five experiments to be studied in a unified manner despite their differences. The procedures were criteria centric, in that each criterion was examined one-at-a-time across all five experiments before the next criterion was studied.

Results

The multi-case study permitted the identification of characteristics of success and considerations for improvement in experiments that use bibliographic control methods to improve access to Web resources. These results are presented below within the context of the evaluation criteria underlying the study.

Characteristics of Success

Determining the overall success for each experiment requires an in-depth analysis beyond the scope of a single paper. This study took a more general approach and compared these experiments at a higher level in order to identify characteristics of success that were applicable to all of the projects. These characteristics can be discussed under the framework of the five evaluation criteria used in this study.

Organizational structure

The study revealed a number of similarities across the experiments that appear to have contributed to their success. To begin with, each experiment is defined by a list of goals.[1] Clearly, goals alone do not guarantee success, and it is recognized that goals may change throughout the experimental process. What is important here is that the goals provide a focused direction and appear to contribute to a successful experiment. Related to goals, each experiment has a defined administrative structure comprised of project leaders, members, and/or partners-and to take this observation a step further, project participants were found well beyond the confines of a single institution. It seems that an obvious administrative structure, particularly one overseeing decision-making processes, and that partnerships beyond the confines of a single institution may both be characteristics of successful experimentation. A final factor under this criterion that, no doubt, contributes to successful experimentation is adequate funding. Government funding supported four of the five experiments, and CORC is funded by its members, which mainly includes libraries.

Reception

How well a development is received by a community can be an indicator of success. The mutli-case study was enhanced with an electronic survey that was sent to five bibliographic control professionals, five information professionals who are not engaged in bibliographic control, and five general Web users.[2] The survey asked participants if they had knowledge about any of the five experiments examined in this study, if they had knowledge of Yahoo! and Lycos, and what was their preferred starting point for a Web search. While the participant sample was convenient, and not necessarily statistically sound, the results in conjunction with the results of the multi-case study are helpful in examining this criterion.

A majority of the bibliographic control professionals were aware of more than one of the experiments evaluated in the multi-case study, and three of the five persons in this group referred to the availability of what they called excellent documentation and tools. For example, the NORDIC Dublin Core metadata templates (http://www.lub.lu.se/cgi-bin/) and the corresponding User Guidelines for Dublin Core Creation (e.g., http://www.sics.se/~preben/DC/DC_guide.html). All five of the experiments evaluated have fairly substantial documentation-and in a number of cases support access to tools. It is likely that open documentation and access to tools contribute to the success of an experiment.

Duration

The average duration of the experiments examined for this study is three years. This is a result of the fact that the three UKOLN experiments (BIBLINK, DESIRE, and ROADS) had defined time frames in which they successfully completed designated tasks, and that the Web is less than a decade old. The NORDIC project is in its second phase, which began in January 1999, and that CORC, which was launched in Janurary1999, is a fully functional project that is no longer considered experimental. The progression from alpha and beta testing, and/or through various phases or versions is demonstrative of success. What is particularly exciting in the realm of duration is the model offered by CORC, which promotes continued growth of the experiment via its transition from the experimental stage to a fully operational project. Related to this observation, however, one must consider the result of the experiments that may have had a shorted time frame, but have long-term impact. For example, the ROADS project software toolkit is still accessible and continues to be used in various services like the Social Science Information Gateway (SOSIG) and the OMNI Health and Medicine Gateway (e-mail correspondence with Michael Day, Research Officer, UKOLN The UK Office for Library Information, University of Bath).

Application of computing technology

Today, it is impossible to discuss bibliographic control and computing technology without referring to online catalogs and the MARC format. These developments take advantage of computing capabilities to support interoperability amongst information systems, expedient and efficient resource organization and access, and distributed cataloging via networked communication protocols. The experiments examined for this study have successfully taken advantage of Web-based technology in a similar way in order to support resource discovery and communication among different institutions. Beyond these developments, computing technology offers many more sophisticated capabilities, particularly in the area of information retrieval. The ROADS project has been among one of the most successful projects in this area, promoting searching across multiple gateways, harvesting, relevance ranking of retrieval results, interface customization, hierarchical browsing, and multi-lingual access (ROADS User Survey Results, 1999). Another example is found with the CORC project, which includes the Scorpion algorithm (Shafer, 1998) for automatic classification.

Use of human resources

The final evaluation criterion involves the use human resources. Despite great strides by the artificial intelligence community, human beings can still outperform computers in most complex intellectual tasks. This coupled by the fact that networked protocols can facilitate communication and can unit the talent and skill of persons involved in the production, acquisition, organization, and life of a Web resource invites new documentation possibilities. The experiments reviewed in the multi-case study involve administrators, bibliographic control and other information professionals, and in some cases Web resource creators (document authors). The NORDIC project and CORC both allow for metadata to be created by professionals and resource creators, as long as there is quality control by professionally trained metadata experts. BIBLINK involves another collaborative relationship in that national bibliographic agencies work with publishers to establish authoritative bibliographic information for electronic resources. These partnerships are successful in that they harness and optimize the expert knowledge of bibliographic control professionals by having them focus more-or-less exclusively on activities that require their expertise, while persons or agencies with less skill are responsible for the simpler yet time consuming bibliographic control tasks.

Considerations for Improvement

While the evaluation of Web resource access experiments allowed for the identification of characteristics of success, it also permitted the identification of project features and aspects that could be improved in future initiatives. These considerations for improvement can also be discussed under the rubric of the five evaluation criteria underlying the study.

Organizational structure

The organizational structure as noted above seemed to be sufficient for the current undertaking of experiments. What needs to be considered now, however, is long-term experimentation as viewed with initiatives such as CORC and its progression from an experimental state to a fully operational and growing project. Also important to consider is the funding and membership structure of CORC, which is slightly different than the other experiments. CORC is supported by a source of consistent funding from its members and it involves more partners compared to any of the other experiments examined. A consideration for improvement is to design experiments that include a plan for continued funding and partnerships that extend beyond a single institution.

Related to these considerations is a perceived need for these experiments to talk to each other and interoperate, particularly in cases where they are using the same features, or where one feature could be enhanced by another feature. There is evidence of interoperability among several of the partnerships supported by the UKOLN projects, but there is a great deal of room for growth in this area. For example, BIBLINK, CORC, and the NORDIC project all work with a variants of the Dublin Core, but they each have their own implementation. ROADS has developed sophisticated subject gateway that could potentially be used to access CORC and other initiatives. The various experiments in this area could have an even greater impact on Web resource access if they communicated not only on a goal level, but also on an operational level. In other words, a framework is needed that will improve interoperability and permit these experiments to talk to each other more than is currently practiced. Along these lines, experimental initiatives might even consider talking to commercial enterprises, such as search engines, that facilitate access to Web resources-a point that is considered further under the next criterion.

Reception

With respect to the study's supplemental survey on Web searching, the bibliographic control professionals were aware, at least by name, of the majority of the Web resource access experiments evaluated in this multi-case study. However, this group of persons represent a minutely small segment of the Web user population. Of the five information professionals not engaged in bibliographic control activities, only two reference librarians were aware of CORC, the rest of the information professionals and all of the average Web users had not heard of any of the other experiments evaluated. These results were offset by the fact that all of the participants had knowledge of Yahoo! and Lycos, and they named various commercial search engines as their primary means of Web access. The point to consider here is that the larger universe of Web users may use commercial search engines for searching not only because of convenience, but because and they are unaware of Web resource access experiments. This should be of concern to the bibliographic community because these experiments can be costly and labor intensive, and more importantly because it is likely that these experiments will yield far superior retrieval results in various domains. (A comparison between the Web-based bibliographic control experiments and commercial search engine algorithms is beyond the scope of this paper. Additionally, this author believes that access via both mechanisms should not be looked at as being diametrically opposed.)

This said, the limited knowledge about the examined experiments may in part be due to the their experimental nature and short lived status. Also, it is likely that place of origin has had an impact on the general knowledge about these experiments, as only one of the five projects was initiated in the United States. Even so, it is important to remember that Yahoo! Lycos, MOSAIC, and many other Web developments in the United States began as experiments at institutions of higher learning, and are now extremely popular on an international scale.

The bibliographic control community needs to consider building stronger public relations and advertising to populations beyond the bibliographic control community, particularly if these experiments are to thrive. Documentation and tools supporting the experiments examined in this study were attractive to bibliographic control professionals, but it seems these resources may not really foster or invite project exploration or use for information professionals not engaged in bibliographic control as well as for the general Web user. Perhaps the bibliographic control community should explore advertising practices, or some variant of this activity, as viewed in the commercial sector. Along these lines, Web resource access experiments might even consider collaboration with commercial search engines or other for-profit initiatives in some form. A partnership with a commercial enterprise, no doubt, requires serious exploration, but it is not unrealistic to ask conference participants to think about this question, especially when it is known that a segment the Northern Light search engine/index is using a version of the Dublin Core and commercial search engines are increasingly interested in the application of bibliographic control methods and classification activities, which require the talent and skill of persons trained in bibliographic control methods.

Duration

It is human nature to equate longevity with success. The printed monograph is one the most successful technological innovations, which, despite all forecasts of its demise in the electronic era, is increasing in number yearly (About Book Title Production, 2000; Library and Information Statistics Tables, 1998). As already indicated, three of the five experiments examined have been completed. Their intent was to investigate various aspects of Web resource access in a certain time frame. But their short-lived lives may be used to raise questions about their long-term value. Again long-term value of these project must be considered and the results leading to new research. A case in point is the RENARDUS project (http://www.renardus.org), an outgrowth of the DESIRE project, which is establishing an academic subject gateway service with integrated access and plans to be a long-term venture. In sum, a consideration for improvement under this criterion is to develop and implement experiments that have long-term goals, and which aim to become fully operational projects that support Web resource access.

Application of computing technology

Computing technology surpasses the human in terms of speed and consistency and supports a wide variety of automatic techniques that can be incorporated into and strengthen bibliographic control operations. Examples of these tasks include natural language processing, automatic classification and indexing, automatic metadata generation and searcher profiling. While the experiments examined in this study explore the use of computing technology, most notably the ROADS project, they do not fully incorporation or exploit automatic processing capabilities. Bibliographic control initiatives need to further explore how to take advantage of and make use of computing technology for its own sake, and also because only through such efforts can human resources, the last criterion, be fully optimized.

Use of human resources

As indicted in the discussion on characteristics of success, these experiments involve administrators, bibliographic control and other information professionals, and in some cases Web resource creators (document authors), and several initiatives have established collaborative partnerships among persons that have varying skills. There is, however, much room for growth in this area, particularly with communication options that are facilitated by networked protocols. Along these lines, the bibliographic control community needs to identify what tasks can be accurately, efficiently, and superiorly performed by the computer, but also what tasks need to be performed by people-and specifically who should perform such tasks so that the bibliographic control professional's expertise can be fully taken advantage of in the aim for access to Web resources.

Call for a Strategic Plan

The bibliographic control community has responded to Web's exponential growth with a series of experiments that aim to improve access to Web resources. These experiments are important because they use traditional cataloging and classification practices and test innovative ideas, processes, and features in environments that extend beyond the library catalog. While these initiatives all aim to improve Web resource access, they differ in some very fundamental ways. Even so, these experiments can be compared, and it is through this type of analysis that the bibliographic control community can identify characteristics of success, considerations for improvement, and initiate superior Web resource access experiments. This paper concludes by suggesting five agenda items that will serve as a base for a strategic plan in this area and by inviting conference participants and other readers of this paper to contribute their ideas and expertise to an effort that will improve experimental initiatives that aim to improve access to Web resources in the new Millennium.

  1. Explore considerations for improvement identified via the multi-case study reported on in this paper. These items include exploring the following:

  2. Continue to evaluate projects
    The specific features and practices supporting the experiments examined in this study and in other initiatives need to be researched on an in-depth level, and these projects need to be compared to each other on an array of levels. Moreover, efforts should be made to employ scientific research methods to such investigations to insure the constructions of a sound body of knowledge. Only through such research efforts can a pool of knowledge be developed to improve future Web resource access experimentation and access to Web resources.

  3. Share research
    All of the experiments examined in this study have conducted evaluations, at least on an informal level. The results of these undertakings appear to be accessible via most project Web pages, but they are not generally disseminated to the larger bibliographic control community through professional and scientific publications and conferences. A central vehicle of communication, such as an electronic bulletin board or a Web site is needed so that the results of all research efforts, both formal and informal, can be shared with the bibliographic and related communities. This type of central sharing ground would greatly assist future initiatives, allow for timely access to results and lessons learned, and permit meta-analyses so that superior Web resource access experiments could be conducted.

  4. Develop an official list of considerations for improvements
    The bibliographic control community should support the construction of an official list of features, applications, and other aspects that could improve Web resources access experiments that employ bibliographic methods. The research conducted here may serve as a starting point, but an official list would assist the larger bibliographic control and information community, and ultimately help to direct future research agendas.

  5. Develop a master Request for Proposal (RFP) for Web resource access experimentation
    The last strategic step to be suggested in this initial draft is to develop a master RFP that could direct Web resource access experimentation. The bibliographic control community has developed master RFPs for online catalogs to share, as demonstrated by CONDOC (Crosby, 1997). A master RFP that recommends an organizational structure, outreach and duration plans, and the best way to use of computing technology and human resources could greatly assist institutions and persons at all levels who want to conduct experiments that aim to improve access to Web resources via bibliographic control methods.

Conclusion

This paper defined the Web resource access experimentation environment and reported on the results of a mutli-case study that compared a series of experiments that are innately different, but which all aim to improve access to Web resources. The Web, while increasingly perceived as a trusted vehicle for the dissemination and recording of information, is still very much in a developmental stage. In fact it has been predicted that by the end of the first decade of the new Millennium, the Web will look vastly different and even unrecognizable compared to today. Whether such forecasts will prove true is difficult to gage, but what seems certain is that bibliographic control methods have a role in the new Millennium. The phenomenal growth of the Web has generated a lot of experimentation, but left little time for formal evaluation these initiatives. The bibliographic control community must rise to the challenge and encourage and conduct evaluations so that bibliographic control experiments are successful and assist with the organization and access of information resources that help to of the define the great domain known as the World Wide Web.

References

Crosby, E. (1997). Towards 'CONDOC 2': identifying new requirements for online catalogs. ALCTS Newsletter 8 (3): A-D.

Cutter, C. A. (1904). Rules for a dictionary catalog, 4th ed., (rewritten). Washington, D.C.: Government Printing Office.

ROADS User Survey Results. (1999). Available at: http://www.ilrt.bris.ac.uk/roads/questions.results

About Book Title Production. (2000). Available at: http://publishing.about.com/arts/publishing/gi/dynamic/offsite.htm?site=http%3A%2F%2Fwww.ipa-uie.org%2Fstatistics%2Fannual_book_prod.html

Library and Information Statistics Tables. (1998). Available at: http://www.lboro.ac.uk/departments/dils/lisu/list98/pub.html

Shafer, K. (1998). A brief introduction to Scorpion. Available at: http://orc.rsch.oclc.org:6109/bintro.html

Acknowledgements

The Author would like to thank the following people for comments on this paper, and interest in these experiments: Michael Day, UKOLN Office for Library and Information Networking, University of Bath; Dr. Brian Sturm, School of Information and Library Science, University of North Carolina at Chapel Hill; and Dr. Mary S. Woodley, Ph.D. Social Sciences Librarian California State University. Thank you also to conference organizers and sponsors for fostering such an important dialog.

Notes

  1. Note that the experiments are discussed in the present tense in this paper for the purpose clarity. Three of the five experiments have been completed fairly recently, but two are still operational.
  2. A convenient sample was used for this part of the study. The sample consisted of five catalogers/metadata professionals, five information professionals (an archivist, a data base administrator, two reference librarians, and a slide curator), and five average Web users (two undergraduate students, an environmental scientist, a professor in education, and an office assistant).

Appendix A

Experiment Goals and Status
Project Name Project Goal URL Beginning Date Phase or Version No.
DESIRE (Development of a European Service for Information on Research and Education) "enhancing existing European information networks for research users across europe through research and development in three main areas of activity: Caching, Resource Discovery and Directory Services." http://www.desire.org/ 1996 (Phase I) July 1998 (phase II) 2 phases; completed June 2000
BIBLINK: Linking Publishers and National Bibliographic Services "to establish a relationship between national bibliographic agencies and publishers of electronic material, in order to establish authoritative bibliographic information that would benefit both sectors." http://hosted.ukoln.ac.uk/biblink/ Apr. 1, 1996 2 Phases; completed Feb. 15, 2000
ROADS (Resource Organisation and Discovery in Subject-based services) "1. to produce a software package which can be used to set up subject-specific gateways
2. to investigate methods of cross-searching and interoperability within and between gateways
3. to participate in the development of standards for the indexing, cataloguing and searching of subject-specific resources"
http://www.ilrt.bris.ac.uk/roads/ 1995 Completed
Nordic Metadata "1. enhancement of the existing dublin core specification
2. creation of dublin core to marc converter.
3. dublin core user support and tools evaluation.
4. maintenance and development of metadata tools"
http://linnea.helsinki.fi/meta/ January 1999 II
OCLC CORC (Cooperative Online Resource Catalog) "to assist libraries in providing their users with well-guided access to web resources" http://www.oclc.org/oclc/corc/ January 1999 open participation
*Thank you to Paulina Vinyard, Master's Student, School of Information and Library Science, University of North Carolina at Chapel Hill, for her assistance in the compilation of this table.

Library of Congress
January 31, 2001
Library of Congress Help Desk