About --
Start --
Index --
Glossary
Notes from meeting 8-20-97 with updates to master alphabetic list of feature codes
It is desireable to mark the archival version of documents.
There did not seem to be a reason to mark the non-archival versions although this was not exhaustively discussed. (Since we may have instances with, say, two GIF files at different levels of resolution, there may be a reason to revisit this subquestion. Possible test case: the broadside pilot.)
The letter "a" would serve well. It would be placed as the final feature code, after other letters marking other feature codes.
Point of closest conflict is in the case of segments that may include "a" for the top row of grid, but the "a" is always to be followed by a numerical for position in the row, e.g., "014sa4a.tif" would mean control number 014, segment, row "a," column 4, archival version.
Archival "a" markers will be applied in future collections, e.g., Puerto Rico Portrait, not collections in midstream. There may be some instances in which the filenames don't have room -- then we punt.
For past or current collections, we resolve to write "read-me" files that leave a Rosetta Stone for our heirs and assigns. Carl will cook up an example (but when?).
In the master set of feature names in section three, you will find three duplicates: "a," "c," and "t." Two are anomalies from the Coolidge collection and should not govern future work: "a" for a magazine cover and "c" for a table of contents. Regarding "t" goes, the weight of past practice favors "thumbnail" and not the George Washington use for "target."
g title page (e.g., for a book)
n table of contents (e.g., for a book)
l list of illus (e.g., for a book)
p repeating page image (e.g., for a book or microfilm frame)
x index (e.g., for a book)
c cover (e.g., for a book)
s segment (too big pages) (e.g., for a book or microfilm frame) -
- NOTE: segments can be further designated to indicate a grid:
sa1, sa2, sa3
sb1, sb2, sb3
y irregularity target (microfilm)
t microfilm content target (for George Washington Papers)
u uncompressed archival file (pictorial) r typical reference file (JPEG) (pictorial) v larger file (JPEG) (pictorial) t thumbnail (pictorial)
a archival version (JPEG compressed) (Fed Theater PIXEL
documents)
r "access version" (bitonal TIFF) (Fed Theater PIXEL
documents)
p image shows full page (Cong Bdsds) q page with flap laid down (there is only one) (Cong Bdsds) t top (Cong Bdsds) b bottom (Cong Bdsds) l left (Cong Bdsds) c center (Cong Bdsds) r right (Cong Bdsds) d detail (Cong Bdsds) e second detail (Cong Bdsds) f third detail (Cong Bdsds) w top left of four-way segment (Cong Bdsds) x top right (Cong Bdsds) y bottom left (Cong Bdsds) z bottom right (Cong Bdsds)
Cover
c first character in filename, e.g., c0d10.tif
Illustrations
a illustration marked; end of the filename includes the indicator
of illustration sequence, e.g., "a0d13-01.tif" is the first (01)
illustration in pamphlet "0d13."
[Carl -- can you make a list of the collections to which this description applies. Not all of us have the order in which materials were converted internalized!!]
After Cal Books and Murray pamphlets and through the upper Midwest part one, we made separate cropped illustration images IN ADDITION TO the page images.
The big version of these separate illustration images have hyphens in the middle. File name is "book number" (3 characters?) then a hyphen, then the illustration number. These often (always?) also had a thumbnail made (scanned and dithered at the same time as the big image). Filenames are identical except they have a "t" where the hyphen was.
BTW: where we are planning to make inline cigarette-pack GIFs for books with separate cropped illustration images, we'll work from the hyphen images and shrink-GIF them and ignore the "t" images. For later books, where there are no separate "-" images, we'll make the cigarette pack images from the full page image for pages-with-illustrations, whether it is bitonal or grayscale.
c table of contents (Coolidge serials)
e editorial page (Coolidge serials)
a cover (sort correctly by controlpage) (Coolidge serials)
first of two exposures:
outside front cover when printpage is 000
inside back cover when printpage is 999
b cover (sort correctly by controlpage) (Coolidge serials)
second of two exposures:
inside front cover when printpage is 000
outside back cover when printpage is 999
The folk music collection (Cowell, Todd & Sonkin?) audio files use an "a"
PREFIX to distinguish them from the "p" photos and "d" drawings in the
same collection, etc. [The image items follow pictorial conventions with
"u", "r", and "t" suffixes.]
The Nation's forum audio files have no feature markings.
Motion pictures delivered in segments have file names that end "s1," "s2,""s3" etc.
NOTE: THE CONGRESSIONAL BROADSIDES COULD BE TREATED AS ANOMALOUS -- THEY ARE _NOT_ LISTED HERE. THE COOLIDGE SERIALS COULD ALSO BE TREATED AS ANOMALOUS -- BUT THEY _ARE_ LISTED HERE.
ASTERISK * MARKS LESS-DESIREABLE USAGE FOR FUTURE PROJECTS (ACCORDING TO CARL -- SQUAWK IF YOU DISAGREE)
a archival version (JPEG compressed) (Fed Theater PIXEL
documents)
a* cover (sort correctly by controlpage) (Coolidge serials)
first of two exposures:
outside front cover when printpage is 000
inside back cover when printpage is 999
NOTE: "a" is also used as first character for the
filenames for illustrations in the African-American
pamphlets (and maybe Calif books and Woman
Suffrage documents)
b cover (sort correctly by controlpage) (Coolidge serials)
second of two exposures:
inside front cover when printpage is 000
outside back cover when printpage is 999
c cover (e.g., for a book)
NOTE: "c" is used as first character for the filenames
for covers in the African-American pamphlets (and
maybe Woman Suffrage documents)
c* table of contents (Coolidge serials)
e editorial page (Coolidge serials)
f full spatial resolution but compressed for pictorial (added
03/03/2000 for additional version coming from New-York Historical
Society). Higher resolution than v.
g title page (e.g., for a book)
i used by P&P for HABS/HAER caption lists (added 11/10/97)
l list of illus (e.g., for a book)
n table of contents (e.g., for a book)
p repeating page image (e.g., for a book or microfilm frame)
q additional reference/access/service file (lower resolution/quality
than r) (added 10/97 for Brown JPEG images for page-turning)
r typical reference/access/service file (JPEG) (pictorial)
r reference/access/service file (bitonal TIFF) (Fed Theater PIXEL
documents)
s segment (too big pages) (e.g., for a book or microfilm frame)
NOTE: segments can be further designated to indicate a grid
(row and column):
sa1, sa2, sa3
sb1, sb2, sb3
t thumbnail (pictorial)
t* microfilm content target (for George Washington Papers)
u uncompressed archival file (pictorial)
v larger file (JPEG) (pictorial)
x index (e.g., for a book)
y irregularity target (microfilm)
z blank page (added 10/97, in part to facilitate interleaving of
pages in documents scanned all LH followed by all RH rather than in
page number order)
Digits one thru four identify the item
C02A Constitutional Convention 2A
C05_ Constitutional Convention 5
023_ Continental Congress 23
136A Continental Congress 136A
Digit number five
Copy number 1 thru 9, also one example of "H" for a 39th
copy
Digits six and seven
Image control number typically "01" is a front and "02"
is a back; several multipage
items are included, these go to
thirty or more pages
Digit eight
Feature code See list in first section of this document
In general our target images (within a directory) receive the name 000. Target images are not intended for display. The first page for display is 001. These are the "ccc" part of the filename, i.e., "controlpage" not "printpage." Print page numbers if present and if recorded are the latter part of a filename; classic pattern is cccpppf, with f=feature code or codes. LOTS of collections adhere to this pattern, e.g., woman suffrage documents. You'll find a target there for each item.
Treatment of serials in coolidge was unique in its handling of the four cover images: front, inside front, inside back, and back. The convention is to use the cccpppf filename type. Without looking in the directory to confirm, Carl thinks this is what is there, in the case where we have two issues of a serial under one bib record:
000 target
0011000a front cover, first of two issues
0021000b inside fc
0031001 printpage 1, issue one
0041002 printpage 2
etc.
0561054 printpage 54, issue one
0571999a inside back cover, issue one (slight chance this was "b")
0581999b back cover
0592000a front cover issue 2
0602000b inside front cover, issue 2
0612001 printpage 1 issue 2
0622002 printpage 2
etc.
1222060 printpage 60, issue 2
1232999a inside back cover issue 2
1242999b back cover issue 2
About -- Start -- Index -- Glossary
Feature codes in filenames
National Digital Library Program
Comments: caar@loc.gov
(03/03/2000)