About -- Start -- Index -- Glossary
This is an NDLP documentation DRAFT

Feature codes used in filenames to distinguish image versions and special categories of pages

Notes from meeting 8-20-97 with updates to master alphabetic list of feature codes

  1. FINDINGS OF THE GROUP THAT MET
    Anderson, Arms, Bramel, Fleischhauer, Friedland, Graham, Heckscher (from Carl's memory)
  2. COLLECTION OF INFORMATION FROM DIFFERENT SOURCES
  3. COMPILATION IN ALPHABETICAL ORDER OF ALL THE FEATURE CODES THAT HAVE BEEN USED (last modified 03/03/2000)
  4. ADDED NOTES ON THE FILENAMES USED FOR CONGRESSIONAL BROADSIDES
  5. OTHER INFORMATION CONTAINED IN FILENAMES. [Added 8/28/97]

SECTION ONE
FINDINGS OF THE GROUP THAT MET

  1. It is desireable to mark the archival version of documents.

    There did not seem to be a reason to mark the non-archival versions although this was not exhaustively discussed. (Since we may have instances with, say, two GIF files at different levels of resolution, there may be a reason to revisit this subquestion. Possible test case: the broadside pilot.)

  2. The letter "a" would serve well. It would be placed as the final feature code, after other letters marking other feature codes.

    Point of closest conflict is in the case of segments that may include "a" for the top row of grid, but the "a" is always to be followed by a numerical for position in the row, e.g., "014sa4a.tif" would mean control number 014, segment, row "a," column 4, archival version.

  3. Archival "a" markers will be applied in future collections, e.g., Puerto Rico Portrait, not collections in midstream. There may be some instances in which the filenames don't have room -- then we punt.

  4. For past or current collections, we resolve to write "read-me" files that leave a Rosetta Stone for our heirs and assigns. Carl will cook up an example (but when?).

  5. In the master set of feature names in section three, you will find three duplicates: "a," "c," and "t." Two are anomalies from the Coolidge collection and should not govern future work: "a" for a magazine cover and "c" for a table of contents. Regarding "t" goes, the weight of past practice favors "thumbnail" and not the George Washington use for "target."


SECTION TWO
COLLECTION OF INFORMATION FROM DIFFERENT SOURCES

From SIG contract

g    title page (e.g., for a book)
n    table of contents (e.g., for a book)
l    list of illus (e.g., for a book)
p    repeating page image (e.g., for a book or microfilm frame)
x    index (e.g., for a book)
c    cover (e.g., for a book)
s    segment (too big pages) (e.g., for a book or microfilm frame) -
     - NOTE: segments can be further designated to indicate a grid:
          sa1, sa2, sa3
          sb1, sb2, sb3

Preservation Resources microfilm scanning

additional codes
y    irregularity target (microfilm)

George Washington special

t    microfilm content target (for George Washington Papers)

Pictorial images new RFP

u    uncompressed archival file (pictorial)
r    typical reference file (JPEG) (pictorial)
v    larger file (JPEG) (pictorial)
t    thumbnail (pictorial)

Fed Theater Project work by Picture Elements, Inc.

a    archival version (JPEG compressed) (Fed Theater PIXEL
     documents)
r    "access version" (bitonal TIFF) (Fed Theater PIXEL
     documents)

Congressional Broadsides (see addendum at bottom)

p    image shows full page (Cong Bdsds)
q    page with flap laid down (there is only one) (Cong Bdsds)
t    top (Cong Bdsds)
b    bottom (Cong Bdsds)
l    left (Cong Bdsds)
c    center (Cong Bdsds)
r    right (Cong Bdsds)
d    detail (Cong Bdsds)
e    second detail (Cong Bdsds)
f    third detail (Cong Bdsds)
w    top left of four-way segment (Cong Bdsds)
x    top right (Cong Bdsds)
y    bottom left (Cong Bdsds)
z    bottom right (Cong Bdsds)

African American pamphlets (and maybe Calif books and Woman Suffrage)

Cover

c    first character in filename, e.g., c0d10.tif

Illustrations

a    illustration marked; end of the filename includes the indicator
     of illustration sequence, e.g., "a0d13-01.tif" is the first (01)
     illustration in pamphlet "0d13."

Illustrations in early (but not very early) collections

[Carl -- can you make a list of the collections to which this description applies. Not all of us have the order in which materials were converted internalized!!]

After Cal Books and Murray pamphlets and through the upper Midwest part one, we made separate cropped illustration images IN ADDITION TO the page images.

The big version of these separate illustration images have hyphens in the middle. File name is "book number" (3 characters?) then a hyphen, then the illustration number. These often (always?) also had a thumbnail made (scanned and dithered at the same time as the big image). Filenames are identical except they have a "t" where the hyphen was.

BTW: where we are planning to make inline cigarette-pack GIFs for books with separate cropped illustration images, we'll work from the hyphen images and shrink-GIF them and ignore the "t" images. For later books, where there are no separate "-" images, we'll make the cigarette pack images from the full page image for pages-with-illustrations, whether it is bitonal or grayscale.

Coolidge collection serials

c    table of contents (Coolidge serials)
e    editorial page (Coolidge serials)
a    cover (sort correctly by controlpage) (Coolidge serials)
          first of two exposures:
               outside front cover when printpage is 000
               inside back cover when printpage is 999
b    cover (sort correctly by controlpage) (Coolidge serials)
          second of two exposures:
               inside front cover when printpage is 000
               outside back cover when printpage is 999

Audio

The folk music collection (Cowell, Todd & Sonkin?) audio files use an "a" 
PREFIX to distinguish them from the "p" photos and "d" drawings in the
same collection, etc. [The image items follow pictorial conventions with
"u", "r", and "t" suffixes.]
                                                                                
The Nation's forum audio files have no feature markings.                        

Motion pictures

Motion pictures delivered in segments have file names that end "s1," 
"s2,""s3" etc.   

SECTION THREE
COMPILATION IN ALPHABETICAL ORDER OF ALL THE FEATURE CODES THAT HAVE BEEN USED

NOTE: THE CONGRESSIONAL BROADSIDES COULD BE TREATED AS ANOMALOUS -- THEY ARE _NOT_ LISTED HERE. THE COOLIDGE SERIALS COULD ALSO BE TREATED AS ANOMALOUS -- BUT THEY _ARE_ LISTED HERE.

ASTERISK * MARKS LESS-DESIREABLE USAGE FOR FUTURE PROJECTS (ACCORDING TO CARL -- SQUAWK IF YOU DISAGREE)

a    archival version (JPEG compressed) (Fed Theater PIXEL
     documents)
a*   cover (sort correctly by controlpage) (Coolidge serials)
          first of two exposures:
               outside front cover when printpage is 000
               inside back cover when printpage is 999
          NOTE: "a" is also used as first character for the
          filenames for illustrations in the African-American
          pamphlets (and maybe Calif books and Woman
          Suffrage documents)
b    cover (sort correctly by controlpage) (Coolidge serials)
          second of two exposures:
               inside front cover when printpage is 000
               outside back cover when printpage is 999
c    cover (e.g., for a book)
          NOTE: "c" is used as first character for the filenames
          for covers in the African-American pamphlets (and
          maybe Woman Suffrage documents)
c*   table of contents (Coolidge serials)
e    editorial page (Coolidge serials)
f    full spatial resolution but compressed for pictorial (added
     03/03/2000 for additional version coming from New-York Historical
     Society).  Higher resolution than v.
g    title page (e.g., for a book)
i    used by P&P for HABS/HAER caption lists (added 11/10/97)
l    list of illus (e.g., for a book)
n    table of contents (e.g., for a book)
p    repeating page image (e.g., for a book or microfilm frame)
q    additional reference/access/service file (lower resolution/quality
     than r) (added 10/97 for Brown JPEG images for page-turning)
r    typical reference/access/service file (JPEG) (pictorial)
r    reference/access/service file (bitonal TIFF) (Fed Theater PIXEL
     documents)
s    segment (too big pages) (e.g., for a book or microfilm frame)
     NOTE: segments can be further designated to indicate a grid
     (row and column):
          sa1, sa2, sa3
          sb1, sb2, sb3
t    thumbnail (pictorial)
t*   microfilm content target (for George Washington Papers)
u    uncompressed archival file (pictorial)
v    larger file (JPEG) (pictorial)
x    index (e.g., for a book)
y    irregularity target (microfilm)
z    blank page (added 10/97, in part to facilitate interleaving of 
     pages in documents scanned all LH followed by all RH rather than in 
     page number order)

SECTION FOUR
ADDED NOTES ON THE FILENAMES USED FOR CONGRESSIONAL BROADSIDES
RECORDED HERE FOR THE HISTORICAL RECORD ONLY

Digits one thru four identify the item
     C02A Constitutional Convention 2A
     C05_ Constitutional Convention 5
     023_ Continental Congress 23
     136A Continental Congress 136A

Digit number five
     Copy number    1 thru 9, also one example of "H" for a 39th
                    copy

Digits six and seven
     Image control number     typically "01" is a front and "02"
                              is a back; several multipage
                              items are included, these go to
                              thirty or more pages
Digit eight
     Feature code   See list in first section of this document

SECTION FIVE
OTHER INFORMATION CONTAINED IN FILENAMES

In general our target images (within a directory) receive the name 000. Target images are not intended for display. The first page for display is 001. These are the "ccc" part of the filename, i.e., "controlpage" not "printpage." Print page numbers if present and if recorded are the latter part of a filename; classic pattern is cccpppf, with f=feature code or codes. LOTS of collections adhere to this pattern, e.g., woman suffrage documents. You'll find a target there for each item.

Treatment of serials in coolidge was unique in its handling of the four cover images: front, inside front, inside back, and back. The convention is to use the cccpppf filename type. Without looking in the directory to confirm, Carl thinks this is what is there, in the case where we have two issues of a serial under one bib record:

                                                                                
000    target                                                                   
                                                                                
0011000a  front cover, first of two issues                                      
0021000b  inside fc                                                             
0031001   printpage 1, issue one                                                
0041002   printpage 2    

etc.
                                                                                
                                                                                
0561054   printpage 54, issue one                                               
0571999a  inside back cover, issue one (slight chance this was "b")             
0581999b  back cover                                                            
0592000a  front cover issue 2                                                   
0602000b  inside front cover, issue 2                                           
0612001   printpage 1 issue 2                                                   
0622002   printpage 2                                                           
                                                                                
etc.                                                                             
                                                                                
1222060   printpage 60, issue 2                                                 
1232999a  inside back cover issue 2                                             
1242999b  back cover issue 2                                                    

About -- Start -- Index -- Glossary

Feature codes in filenames
This is an NDLP documentation DRAFT
National Digital Library Program
Comments: caar@loc.gov (03/03/2000)