Authority Control on the Web

Barbara B. Tillett

Final version

I won't repeat all the literature reviews (1) and list all of the articles on authority control of this past century, since we all have read them and discussed them. I will instead focus attention on how the authority control performed by libraries can help the Web and suggest some next steps in making this tremendous resource of authority records available and used internationally.

For over a quarter of a century we have been explaining within the library world the virtues of authority control in catalogs, bibliographies, finding aids, and other bibliographic lists to improve the precision of searches and to provide collocation. With the advent of the Internet, and the work of Dublin Core to involve non-library folks in the discussions, the word is out that the libraries may have something there. Queries from intellectual property rights management groups, archives, museums, Web search engine companies, and corporations - some of them are starting to realize they don't have to reinvent the wheel or in this case international authority control. Let's help them achieve it!

THE WEB

The Web is chaotic. Many users get something back and think that's good enough, but may not realize the exact item they need wasn't retrieved. It may be because they tried an author's name and he/she used a pseudonym on the piece they want; or the corporate body they were trying to find changed names or used an acronym that they forgot to search by; or the editors of that famous work didn't use the well known title when it was most recently published. These and many other bibliographic variations cause searches to fail or to retrieve incomplete and sometimes misleading information.

Due to limits on the scope of cataloging, library catalogs don't give you the articles you want and often miss providing access to individual works in a collection or the contents of a compilation or conference proceedings. Authority control won't help that problem, but will help assure getting all the works that were attributed to a particular bibliographic identity (in the Anglo-American cataloging tradition).

When we add library catalogs to the mix of online resources on the Web, we introduce controlled vocabularies for subjects, names (persons, corporate bodies, conferences), and titles. Online catalogs can now serve as gateways to online resources and vice versa. For example, we now provide hypertext links from bibliographic and soon authority records to resources available on the Web. By clicking on the link in the bibliographic or authority record, we launch an Internet connection to the online resource, which may connect to the full text document described in the bibliographic record, or a finding aid cited in the bibliographic record, or perhaps a biographical entry in a reference tool cited in an authority record for a person.

We could also have links on the Web to our catalogs from reference tools or online documents. Those links could allow a researcher to connect from another online tool directly to one or more user-selected online library catalogs to find works by and about the person or corporate body or to topical searches in that library or libraries. We already have this sort of capability in some systems where a user connected to abstracting and indexing services finds they are linked to the resources of library online catalogs that include holdings and location information on where to find specific issues or even to the online full-text article itself. I don't expect abstracting and indexing services to start using authority control for personal or corporate names - that battle has been fought for over a century - but we can help users once they are in the realm of our online catalogs to filter their search results to get what they want and not a lot of extraneous garbage. This may be through authority records that provide access and help distinguish between similar names or provide links from pseudonyms or other variant forms of name or subject terms. (2)

PRECISION AND RECALL

The addition of library catalogs to the mix of information being searched on the Web will open up the Web to focused, topical collections and resources held in and made accessible through the world's libraries. Catalogs have a basic syndetic structure that facilitates finding and gathering together of those resources in whatever media. Authority control enables "precision and recall," which are lacking from today's Web searches. Authority control provides precision to retrieve only those records or items of interest, and the syndetic structure of authority control's cross references assures recall of all the relevant materials, as well as navigation to reach bibliographically related materials. It cannot be stressed enough that this feature of online catalogs adds tremendous value to the user's search and retrieval process. No more wading through tens of thousands of retrieved and computer ranked results for anything close to what we asked for, unless we want to. Let's give users the option for more precise searching, if they want it.

CONNECTING INTERNATIONAL AUTHORITY FILES

From January 1995 through December 1997, the European Commission funded the AUTHOR Project within CoBRA (Computerised Bibliographic Record Actions) to explore the international exchange and re-use of authority records for personal and corporate names. Five national bibliographic agencies participated in a prototype online authority file:

the Bibliothèque nationale de France (Project manager),
the British Library,
the Koninklijke Bibliotheek Albert 1 in Belgium,
the Biblioteca Nacional of Spain, and
the Biblioteca Nacional de Lisboa in Portugal.

Project AUTHOR converted a sample of about 100,000 authority records for a selected set of personal and corporate author names (all names beginning with the letter O or the letter T and a pre-defined set of names of persons and corporate bodies), plus additional records from each national authority file. The USEMARCON universal converter was used to convert authority records to UNIMARC for the prototype database. The database was then accessible using Z39.50 protocol via the Web.

The challenge was that each library has its own language, cataloging rules, bibliographic record format, and local system for its online authority file. There were 5 cataloging languages: Dutch, English, French, Portuguese, and Spanish, plus a bilingual catalog in French and Dutch. There were 5 cataloging rules: AACR2(UK) and national standards for Belgium, France, Portugal, and Spain. There were 5 MARC formats: KBRMARC (Belgium), INTERMARC (France), BLMARC (UK), UNIMARC (Portugal), and IBERMARC (Spain). There were 4 local systems: GEAC (France and Portugal), VUBIS (Belgium), WLN (UK), and ARIADNA (Spain). These factors naturally presented interesting obstacles in sharing authority information.

In their report on findings (3), Sonia Zillhardt and Fran‡oise Bourdon noted that the study revealed different practices and rules for making authority records for specific entities. Although the similarities in rules and practices were great, some obvious differences were apparent. For example, not all of the libraries consider the names of conferences as candidates for authority control (Spain, Portugal, and Belgium); or when conferences are included, they are considered corporate names (UK) rather than a separate category (France). The French distinguish between territorial names as separate from corporate names, unlike the other libraries. The use of general explanatory reference records and reference entry records was not present in this prototype, and indeed are not present in any of the authority files managed by the AUTHOR project partners.

Other differences involve the various MARC formats and transliteration practices. The various MARC formats have different elements and tag them differently, for example, in France and Belgium they include nationality of the person or corporate body, but that data element may be just buried in a note if present at all in the UK, Portugal, or Spain. Then there is the single versus multiple linked record dilemma for parallel authorized forms for the same entity in different languages or scripts. In Belgium, they create a single authority record with the French and Dutch parallel authorized forms of name, and such records were turned into two linked records when converted to UNIMARC for the project. There were also the obvious differences in transliteration schemes used by the different libraries.

Earlier the IFLA Section on Cataloguing pointed out some of these same problems when linking single-language and/or multi-language name authority files:

There are also problems in different cataloging codes that traditionally recognize different entities as authors and hence providing authority records for those entities. For example, AACR2 recognizes names of ships as "authors" of the ship's logs and recognizes events as entries for publications resulting from that event, while most other cataloging rules do not make this allowance, but instead may include such access as an added entry, if at all. So an authority record in one authority file may not have a counterpart in another national authority file, simply because it is not recognized by the cataloging rules as being eligible. Or there may be differences in the hierarchical levels used by different cataloging rules to represent an entity - such as the conference proceedings of a corporate entity where the AACR2 places conferences as a subheading under the name of the corporate body to group them together. This is a device that collocates the works of that corporate body, but other rules, such as the German RAK (Regeln fr die alphabetische Katalogisierung) would enter the name of the conference itself or use title entry, not creating the cataloger's corporate heading that AACR2 prescribes. The result is no matches when comparing authority records from one authority file to another.

Another experiment with multiple authority files is being proposed within IFLA (International Federation of Library Associations and Institutions), and several groups have already started work towards creating a virtual international authority file. Unlike the AUTHOR project that created a UNIMARC database of exchanged records from various national authority files, the IFLA project would link existing online authority files through a Z39.50 simultaneous search of the identified national authority files. It would explore ways to provide interoperability across multiple authority files, to link authority records for the same entity through existing record numbers, and to provide switching for displays of authorized headings on an international scale. As a first step, the IFLA UBCIM Working Group on MLAR (Minimal Level Authority Records) and the ISADN (International Standard Authority Data Number) reported its recommendations on the mandatory minimal set of data elements that should be present in all authority records to facilitate international exchange or use. (5) A follow-on group within IFLA, FRANAR (Functional Requirements and Numbering for Authority Records), now is exploring the numbering and functional requirements for authority records. The IFLA Section on Cataloguing Working Group on FSCH (Forma and Structure of Corporate Headings) is exploring the structures and forms of corporate names to inform developers of future systems and the development of the virtual international authority file. Great benefit may be gained from sharing authority information on an international level. Work continues in this area, and I'll have more to report after the IFLA Conference in Jerusalem in August 2000.

DIGITAL ENVIRONMENT AND METADATA

Crosswalks, like those provided in CORC, link Dublin Core metadata and cataloging rule-based records in MARC and other formats with XML and other communication structures, and expand the opportunities for contributing authority records to an international pool. Standards and agreements are emerging, like a Dublin Core for Authorities (work at the Deutsche Bibliothek) and the basic mandatory data elements recommended in IFLA's "Minimal Level Authority Record," as noted above. [To be expanded]

MULTIPLE SCRIPTS

The combination of Unicode and new technologies are opening up access to all scripts and all languages. Many libraries used to create handwritten catalog entries in book catalogs or old card catalogs and could write in original scripts when transcribing title page information. Even the early printed cards from the Library of Congress included beautiful scripts in the description with accompanying "filing title" information for the transliterated forms of the titles to make it possible to integrate these records into roman alphabet card catalogs. With the early online cataloging tools that multiple script capability was abandoned, because the technology could not handle it. Later the RLIN and OCLC capabilities for selected scripts appeared and now we see on the horizon the potential of using Unicode to present all scripts for all languages in bibliographic and authority records. Within another year this will be a reality in several online systems, opening up the technical capability.

Such possibilities also open up the possibilities of sharing information on a global scale and experiments are already underway. One such experiment is this year's progress among the Hong Kong consortium or research libraries to provide authority records with both Chinese authorized headings in Chinese script and parallel authorized headings from the Library of Congress in the roman alphabet, allowing access from either form.

SWITCHING FOR DISPLAYS

This also gets to a point I've been pushing for a long time - that of "access control" instead of "authority control." I still haven't found another term to use for this concept, but the idea is to control collocation, so the library or the user can select the form of the controlled heading they want to see - the system could switch the display to the chosen form or a default form set by the library. Authority control pulls together all the various forms and relates entities in a way that leads the user to the desired materials and provides a big picture of what is available. With "access control" the same underlying authority records provide control, but the display form is user-selected. In the international context, users may prefer to see headings in their own script, may want to see the names of works in their own language or the names of corporate bodies in a well-known form that may not follow any cataloging rules. Computers let us do this sort of thing through default displays for a given library catalog or a user-selected choice, perhaps recorded in their own computer "client" with an intelligent system making the switch for them before displaying records or entries.

This switching can be accomplished in many ways, such as through using only a number or other identifier for the entity that links to the authority record to display the chosen form. For now that is a single form prescribed by the library, but could also be a form in the users' own language or script preference. Many systems include the authorized form of the name as a text string and may have an associated authority record number for the entity represented by the text string. Through either the text string or the record number link, one can navigate to associated authority records in different countries with different languages and cataloging rules to display their chosen form. This concept is being explored in IFLA.

ISADN (International Standard Authority Data Number)

In 1982 Nancy Williamson predicted catalogs by 2006 would have invisible links of variant forms of names to retrieve all the bibliographic works of a particular person. (6) Many agree this would be lovely, but what would be behind those invisible links to make it work? Some have suggested a standard number.

During the late 1970's an IFLA Working Group led by Tom Delsey suggested establishing an International Standard Authority Number (ISAN) and described the organizational structure for controlling such numbers and their assignment and maintenance. Delsey recognized the practical aspects of administering such a number was "far from simple." (7) The idea of the ISADN (International Standard Authority Data Number) was reiterated in the Guidelines for Authority and Reference Entries published by IFLA in 1984.(8) Unfortunately the cost of an international organization to manage such a system was prohibitive and technology had not yet advanced to a point to assist such international control, so the idea fell by the wayside.

A model put forward by Snyman and Jansen van Rensburg suggested using an International Standard Author Number (ISAN) (9), which they later label as "INSAN." (10) Despite unfortunate problems with their historical facts and citations to earlier work in this area, Snyman and van Rensburg offer the same solution of a single number used universally. Their number contains 18 characters:

the first two alphabetic characters to identify the agency responsible for issuing the author number, the next two alphabetic characters identifying the nationality of the author (a big problem for the United States where we tend to catalog materials for authors worldwide and not just for our own country), the next 3 alphabetic characters to identify the language typically used in the author's original publications (also problematic for the current world's authors), the next 4 numeric characters for the year of issue of the number, the next 6 numbers to be a serial number assigned incrementally for the INSAN - allows for a million new "authors" per year per agency), and a final check digit at the end.

There have been many calls for the use of a standard number for bibliographic entities, such as those proposed in the 1970's and more recently by ICA, IFLA, , and others. But has such a single number approach passed its time in terms of what we can now technically accomplish? Given today's technologies with hyperlinks, URL's, and other mechanisms to connect records and identify and display content there may be better ways to link, navigate, and display authorized headings.

The simplicity and elegance of having a single number to universally control the names for persons, corporate bodies, and works persists to this day. It is attractive to those dealing with copyright and other intellectual property rights, to archives and libraries and museums wishing to share the cost of bibliographic and authority control. The IFLA UBCIM Working Group on Minimal Level Authority Records and ISADN in its final report, "Mandatory Data Elements for Shared Authority Records" in 1998 stated that such numbers may not be needed if one used instead the existing authority record control numbers and linked them across the authority files of the major national bibliographic agencies. Despite this recommendation, some members of IFLA itself persisted in calling for an ISADN (International Standard Authority Data number). So another IFLA UBCIM Working Group is now in progress looking yet again at the functional requirements and numbering for authority records (FRANAR).

AUTHORITY RECORD RESOURCES

A pool of authority records for bibliographic entities (persons, corporate bodies, works/expressions, concepts, objects, events, and places) to use on the Internet is of interest not only to libraries and their users but also to publishers, copyright and rights management organizations, museums, and archives. We already have several major authority files created by national bibliographic agencies, such as the Library of Congress and the national libraries of Belgium, France, Germany, Italy, Portugal, and Spain, to name a few. This wealth of information provides a huge resource that hold great potential for enabling the controlled access of the future on a global scale across many applications for libraries, intellectual property rights, archives, museums, etc.

MODEL

Let's explore one scenario for how this all might actually work for name and title authority data. Taking a practical approach to use what we already have rather than to establish a complex system to create yet another control number, let's look at one model.

There are multiple objectives:

All very grand and wonderful, but where to begin?

We have the existing authority files from major national bibliographic agencies and IFLA can maintain a list of those agencies that would be willing to share their authority records on the Web.

We know the composition of those authority records from earlier IFLA studies and can map the mandatory data elements for a Z39.50 profile. [recommendation: have LC establish such a profile]

How to assure pulling up all existing authority records for the same entity when forms may vary from file to file? Rather than creating a worldwide system for assigning an ISADN, we can use the existing authority record control numbers to provide a link, and to make display easier, it may be useful to also include the text string for the authorized form for the name.

That text string and record identifier from another nation's authority file could be used to switch the form of headings in shared bibliographic records either when cataloging or when displaying the records to users.

As noted in the IFLA recommendations on mandatory data elements in shared authority records, such authority records should have the text strings for authorized forms and variant forms of name and a record of related entities, as well as the number for the entity. (There are 19 mandatory data elements prescribed by IFLA and another 3 elements that are highly recommended.) (11) That entity identifier may be reflected in the record number for the bibliographic identity of the entity, such as LC's control number (LCCN). That number needs to uniquely identify the record for the entity (or bibliographic identity in AACR2 terms) and by extension it can be used to identify the entity/bibliographic identity itself.

As a quick aside, note the distinction between authority records - a device to record decisions, used for maintenance and display of a chosen authorized form and links (references) from variant forms of name for a given bibliographic identity/entity and links with names for related entities (see also references) and authority entries or authorized headings - the chosen controlled form of the heading used as the access point or the display form for the name of an entity.

The internal workings of an online system can store text strings, numbers, codes, or other mechanisms to then display the authorized heading to the user. How a system accomplishes this should be transparent to the user. The system could also display any chosen form the user requests or a default form chosen by the agency presenting the information to the user. Currently, we use a default form that is specially tagged in an authority record as the authorized form according to our cataloging rules.

So back to our model.

Original Cataloging
Scenario 1 - match found for same entity
  1. Cataloger A begins cataloging a new item (creating a bibliographic record) and identifies a name (personal, corporate, conference, or uniform title). Cataloger A's online system automatically checks the local authority file and if not found also checks the virtual international authority file (automatically launching a Z39.50 connection behind the scenes) to see if that name has already been established somewhere in the world. Hopefully future online connections will be much faster and more reliable than they are today!
  2. Let's say the name was established by Cataloger X across the world and the record was found and displayed to Cataloger A.
  3. Cataloger A confirms it is the same entity.
  4. If the Cataloger X authority record can be used as is, Cataloger A lets the system know it is ok.
  5. The record is automatically added to the local authority file with a system generated local authority record control number, preserving the control number and text string found in Cataloger X's original authority record.

Variation on Scenario 1 - the international authority file finds two or more matches and displays them to cataloger A who then can select the one that is closest to their needs (more complete or matches the language and /or cataloging rules used by Cataloger A's library).



Original Cataloging
Scenario 2 - match found for same entity but needs editing

Same as Scenario 1 except for step 4 where Cataloger A decides to edit the existing authority record to meet the cataloging rules and practices of Cataloger A's library. Cataloger A then lets the system know the record is ready and
5. is the same - The record is automatically added to the local authority file with a system generated local authority record control number, preserving the control number and text string found in Cataloger X's original authority record.

Original Cataloging
Scenario 3 - match found but too time consuming to edit

Same as Scenario 1 except for step 4 where Cataloger A determines editing would be too time consuming and has the system generate an automatic authority record that is then edited as needed.
5. Cataloger A confirms and tells the local system this is the same entity.
6. Same as scenario 1, step 5 - The record is automatically added to the local authority file with a system generated local authority record control number, preserving the control number and text string found in Cataloger X's original authority record.

Original Cataloging
Scenario 4 - match determined to be for different entity

Same as scenario 1 except Step3 where Cataloger A determines the found authority record from Cataloger X is not for the same entity.
4. The local system creates an automatic authority record that can then be used or edited and added to the local authority file.

Original Cataloging
Scenario 5 - no match found in local or international authority file

The local system creates an automatic authority record that can then be used or edited and added to the local authority file.

Original or Copy Cataloging
Scenario 6 - match in local authority file without an internationally linked authority record

Let's say now Cataloger B in the same library keys in the heading for the same entity on an original record for another item or the heading appears on a record from copy cataloging. The local system finds the matching authority record and alerts the Cataloger B that the heading is already established.

Note that the cataloger could choose to launch a check against the international authority file at this point if desired for future links.


Copy Cataloging
Scenario 7 - match in local authority file with internationally linked authority record

If cataloger B is doing copy cataloging and brought in a record from another country, that used a different form for its authorized heading and the system discovers the heading in the local authority file, notices the text string matches a parallel authorized form from that other country (preserved when the international authority file record was captured) and either automatically switches the form in the bibliographic record to match the Cataloger C's authorized form, or displays that form (or the users chosen form) on the fly when the record is presented to a user. This display capability actually could apply to any of the scenarios for alternate forms for the name found in the authority record.

Copy Cataloging
Scenario 8 - no match in local authority file

Cataloger C brings in a copy cataloging record from another country with no matches in the local authority file, so the system launches the search of the virtual international authority file and displays any matches (either on a reference or an authorized form or near matches). If a match, then we are back to the same process as with Cataloger A - either use the record as is, edit it, or create a new authority record, linking when it's the same entity. If no match, then the local system automatically generates a base authority record that the cataloger can use or edit as needed.

Are you getting the idea? And you may have other suggestions for how this could play out.

ONE SIZE DOES NOT FIT ALL

[Why need cross references that go with the rules and chosen authorized heading..and how the alternate authorized form from other cataloging rules can be used as either another variant form (x ref, see from)) or a related heading (xx ref, see also from). - to be expanded]

Why not create one giant authority file that combines all the variant forms from all the authority files and lets the user decide which form to display? Cross references follow rules for a given catalog's syndetic structure. One cannot just combine all the references from different authority records created from different cataloging rules and principles and have it work elegantly. Differences in display order and filing rules, rules for additions to and omissions from names - all combine to destroy syndetic structures. Users would find themselves buried in variations.

SUBJECT AUTHORITY

I've focused mostly on sharing of name and title authority information, but there is the whole universe of subject authority control and efforts to link various subject heading schemes, thesauri, and subject classification systems. Experts, such as Karen Markey Drabenstott in particular, have pointed to ways to improve subject searching on the Web and much work is still needed in this field. [citations to be added] Gail Hodge also recently suggested using knowledge organization systems that include authority files, glossaries, dictionaries, gazetteers of place names, classification schemes, etc. to help structure digital libraries and this can be extended to online information on the Web. (12.)

FUTURE

Authority control remains the most expensive part of cataloging, but through cooperative efforts like NACO, SACO, and IFLA initiatives, the research done in one library can be shared internationally to lower the cost. We have a wonderful opportunity to really make this work. More prototyping is rumored to be going on in Europe and Chinese libraries are beginning to make links across national authority files. Let's do it.


  1. For the curious, some examples are Arlene Taylor's "Research and Theoretical Considerations in Authority Control: in Tillett, Barbara B. Authority Control in the Online Environment. Haworth Press, 1989 (also published as Cataloging & Classification Quarterly, v.9, no. 3, 1989, p.29-56. Larry Auld's "Authority Control: An Eighty-Year Review," Library Resources & Technical Services, (Oct./Dec. 1982), v. 26, p. 319-330. Barbara Tillett's "Automated Authority Files and Authority Control: A Survey of the Literature," seminar paper, Graduate School of Library and Information Science, University of California, Los Angeles, June 1982; with corrections and additions, October 1982.
  2. I recently came across an article that describes what I was already exploring with Oxford University Press, namely using authority records as a link with biographical information. The Web article provides an interesting set of suggestions to "improve the organization of digital libraries and facilitate access to their content" (abstract) - seeGail Hodge "Systems of Knowledge Organization for Digital Libraries: Beyond Traditional Authority Files," CLIR Publications & Resources, pub91 (April 2000) (Available on the Web as: http://www.clir.org/pubs/reports/pub91)
  3. Zillhardt, Sonia and Françoise Bourdon. AUTHOR Project : Transnational Application of national Name Authority Files, Library Project PROLIB/COBRA-AUTHOR 10174, Final report. Paris : Bibliothèque nationale de France, 1998 (available from the authors).
  4. Murtomaa, Eeva and Eugenie Greig with help of Joan Aliprand. "Problems and Prospects of linking Various Single-Language and/or Multi-language name Authority Files," International Cataloguing and Bibliographic Control, v. 23, no. 3 (July/Sept. 1994), p. 55-58
  5. IFLA Working Group on MLAR and ISADN. Mandatory Data Elements for Internationally Shared Resource Authority Records : Report of the IFLA UBCIM Working Group on Minimal Level Authority Records and ISADN". [Frankfurt]: International Federation of Library Associations and Institutions, Universal Bibliographic Control and International MARC Programme, 1998.
  6. Williamson, Nancy J. "Is there a Catalog in Your Future? Access to Information in the Year 2006," Library Resources & Technical Services, v.26 (April 1982): p, 122-135
  7. Delsey, Tom. "Authority Control in an International Context." In: Tillett, Barbara B. Authority Control in the Online Environment : Considerations and Practices. New York: Haworth Press, 1989., p. 25.
  8. Guidelines for Authority and Reference Entries, recommended by the Working Group on and International Authority System, approved by the Standing Committee of the IFLA Section on Cataloguing and the IFLA Section on International Technology. London: IFLA International Office for UBC, 1984.
  9. Snyman, M. M. M. [and] M. Jansen van Rensburg. "Reengineering name authority control," The Electronic Library, v. 17, no. 5 (Oct. 1999), p. 313-322.
  10. Snyman, M. M. M. [and] M. Jansen van Rensburg. "Revolutionizing name authority control," Digital Libraries, San Antonio, TX, ACM, 2000, p. 185-194.
  11. Op. cit. IFLA Working Group on MLAR and ISADN. Mandatory Data Elements for Internationally Shared Resource Authority Records, p. 3-6.
  12. Hodge, Gail. "Systems of Knowledge Organization for Digital Libraries: Beyond Traditional Authority Files," CLIR Publications & Resources, pub91 (April 2000) (Available on the Web as: http://www.clir.org/pubs/reports/pub91)

Library of Congress
January 23, 2001
Library of Congress Help Desk