About -- Start -- Index -- Glossary
This is an NDLP Documentation DRAFT

How MARC fields are indexed for American Memory and Digital OneBox

As of 10/01/97


Contents:


INTRODUCTION

The indexing for American Memory MARC records is controlled through two tables. The first table maps MARC fields into a smaller set of fields that InQuery uses. The second table specifies how InQuery should index each field.

Important note: How MARC fields are indexed is not directly related to how the fields are displayed in item-level bibliographic displays.

Currently, a default search in American Memory searches most subfields (more details later) in MARC fields:

The current indexing framework provides the potential for fielded searching on

Note: The distinction between fields in upper and lower case is preserved here for consistency. It is only of relevance for people who use InQuery commands directly, rather than through structured search forms.

TABLES THAT CONTROL INDEXING

The indexing rules specified through these tables control how and whether you can search for words in a particular MARC field or subfield.

From marc-fields.pp

This table controls how MARC fields and subfields are mapped to InQuery fields that support the indexing process

  from       to       include       exclude      INQUERY
  MARC       MARC     subfields     subfields    fields
 field      field     in index      from index   (support
                      and exclude   and include   indexing)
                      others        others
   001       NULL      NULL            NULL        DOCID        Record ID #
   010       NULL      NULL            NULL        M010       
   100       111       NULL            NULL        AUTHOR       Author
   245       NULL      abp             NULL        TITLE        Title
   246       NULL      abp             NULL        TITLE
   005       NULL      NULL            NULL        cdate
   017       NULL      a               NULL        NUMBER
   037       NULL      a               NULL        NUMBER
   050       NULL      NULL            u           CALL         LC call number
   260       NULL      ab              NULL        PUB
   260       NULL      c               NULL        date
   300       NULL      NULL            NULL        medium
   510       NULL      c               NULL        M510c
   500       599       NULL            NULL        notes        All notes
   600       653       NULL            2           SUBJ
   655       699       NULL            2           SUBJ
   700       711       NULL            NULL        OTHER 
   730       740       NULL            ab          TITLE        Uniform or
                                                                 analytical 
                                                                 title
   752       755       NULL            2           SUBJ
   752       NULL      b               NULL        STATE
   773       NULL      NULL            w           coll         Collection
                                                                 to which item
                                                                 belongs
   856       NULL      df              NULL        NUMBER       Item's logical 
                                                                 name
                                                                Digital ID
   859       NULL      f               NULL        NUMBER
   938       NULL      a               NULL        NUMBER       Use of 938 being
   938       NULL      NULL            bcd         m938          discontinued
   952       NULL      NULL            NULL        notes        NDLP local
   985       NULL      a               NULL        COLLID       AM collection
                                                                 identifier

Any MARC fields (or subfields) not "included" through columns 1-4 are not currently indexed.

From dm_trans_tab.c

This table controls the types of index InQuery builds for its fields to support searching.

 
FieldDesc pp-fields=[]

   InQuery                  full field  special
    field                   text index  InQuery
                           index        function

   {"DOCID",  "Record ID",   OFF, ON,  INQ_ID,        OFF, INQ_COOP },
   {"NUMBER", "Various Nos", OFF, ON,  INQ_NULL_TYPE, OFF, INQ_COOP },
   {"AUTHOR", "Author ",      ON, ON,  INQ_SOURCE,    OFF, INQ_COOP },
   {"SUBJ",   "subjects ",    ON, ON,  INQ_NULL_TYPE, OFF, INQ_COOP },
   {"TITLE",  "Title ",       ON, ON,  INQ_TITLE,     OFF, INQ_COOP },
   {"cdate",  "change date", OFF, ON,  INQ_NULL_TYPE, OFF, INQ_COOP },
   {"CALL",   "call number", OFF, ON,  INQ_NULL_TYPE, OFF, INQ_COOP },
   {"M010",   "Record Id 2", OFF, ON,  INQ_NULL_TYPE, OFF, INQ_COOP },
   {"M510c",  "MARC 510c",    ON, ON,  INQ_NULL_TYPE, OFF, INQ_COOP },
   {"date",   "date of pub", OFF, ON,  INQ_NULL_TYPE, OFF, INQ_COOP },
   {"medium", "medium",       ON, OFF, INQ_NULL_TYPE, OFF, INQ_COOP },
   {"coll",   "collection",   ON, OFF, INQ_NULL_TYPE, OFF, INQ_COOP },
   {"notes",  "notes",        ON, ON,  INQ_NULL_TYPE, OFF, INQ_COOP },
   {"m938",   "m938",         ON, ON,  INQ_NULL_TYPE, OFF, INQ_COOP },
   {"OTHER",  "other",        ON, ON,  INQ_NULL_TYPE, OFF, INQ_COOP },
   {"COLLID", "Collection ID",OFF,ON,  INQ_NULL_TYPE, OFF, INQ_COOP },
   {"STATE",  "State",        OFF,ON,  INQ_NULL_TYPE, OFF, INQ_COOP },
   {"PUB",    "Publisher",    ON, OFF, INQ_NULL_TYPE, OFF, INQ_COOP },

In a default American Memory search, the entered words are searched against all fields indexed as full text., i.e. with ON in column 3. Fields which are only indexed for fielded searching can be used for canned searches using InQuery commands.

BASIC SEARCHING: Which fields are searched?

Combining, inverting, and sorting the information in the tables above indicates that the following InQuery fields will be searched when a user types words into the American Memory query box:

AUTHOR
Includes MARC fields 100-111, all subfields.
SUBJ
Includes MARC fields 600-653, 655-699, 752-755, all subfields except $2. Field 755 is no longer standard MARC (since 1995); information will probably be moved to 655 eventually. However, since P&P does not expect to make these changes soon, 755 will continue to be indexed within SUBJ for the foreseeable future. In September 1997, P&P decided that field 653 (uncontrolled subject terms) should be included in the same index as controlled vocabulary. [Before that, 653 had been indexed as "topics" to allow distinct treatment for P&P's Digital OneBox. American Memory had never made the distinction.] Field 654 (not indexed) might be used by Ameritech applicants, even if not used by LC.
TITLE
Includes MARC fields 245 and 246, subfields $a, $b, and $p, supplemented by subfields other than $a and $b from 730 and 740. The InQuery title field is also what gets displayed in a hits list.
M510c
Includes MARC field 510, subfield $c (location within source).
medium
Includes MARC field 300, all subfields.
notes
Includes MARC fields 500-599 & 952, all subfields.
coll
Includes MARC field 773, all subfields except $w.
m938
Includes MARC field 938, all subfields, except $b, $c, and $d. P&P uses local field 938 videodisk frame ids.
OTHER
Includes MARC fields 700-711, all subfields.

BASIC SEARCHING: What controls the order of results?

For current American Memory searches, four separate InQuery searches are done if more than one word is entered and the results displayed as a list of hits in four groupings. Hits that are returned by more than one search are displayed only within the first grouping for which they were retrieved. [Within each of these searches, hits are displayed in order of relevancy rank. Weights could be used to favor words found in some InQuery fields (e.g. title and subj) over others. This feature is not currently used for American Memory, but is for some of the other InQuery applications (THOMAS, GLIN, HLAS, etc.). The weights could be hand-crafted for American Memory on the basis of experience and user feedback.]

  1. A search for the "exact phrase" uses the InQuery @3( ) function, looking for words in the given order, with each word within 3 words of each other. [@3 is used instead of @2 to ensure that phrase that include stopwords such as "of" are recognized.]
    [Editor's note: InQuery actually uses the # character to begin function names. LC applications accept @ because the # character is reserved for special use on the WWW. I chose to use @ here because it works both when typed into an American Memory search box and in URLs for "canned" searches, as used in many of the American Memory special presentations. CRA]
  2. A search for all words near each other uses the InQuery @UW20( ) function, looking for all the words, in any order, to appear within 20 words of each other.
  3. A search for all words, but not necessarily near each other, uses the InQuery function @BAND( ) (Boolean AND).
  4. The last search uses the InQuery @SUM( ) function, which can be considered the InQuery default. Ranking takes into account

FIELDED SEARCHING

Terms can be searched for specifically within the following INQUERY fields, using InQuery @field commands. Try entering @field(AUTHOR @band(booker washington)) or @field(TITLE @band(rose window)) in the American Memory Collection Search box. Such fielded searches could also be incorporated into an "advanced" search form (as in P&P's Digital OneBox).


About -- Start -- Index -- Glossary

MARC fields indexed for American Memory and Digital OneBox
This is an NDLP Documentation DRAFT
National Digital Library Program, Library of Congress
Comments: caar@loc.gov (10/01/97)