The USMARC Formats: Background and Principles The following statement of background and principles for content designation in the USMARC formats was approved in 1982 and revised in 1989 by the American Library Association's RTSD/LITA/RASD Machine-Readable Bibliographic Information Committee (MARBI), in consultation with representatives from United States and Canadian national libraries and designated bibliographic networks. The statement includes the principles under which the USMARC formats were developed and constitutes a set of working principles for the ongoing process of format development. This document will be revised as necessary. 1. Introduction 1.1. The USMARC formats are standards for the representation and communication of bibliographic and related information in machine-readable form. 1.2. A USMARC record involves three elements: the record structure, the content designation, and the data content of the record. 1.2.1. The structure of USMARC records is an implementation of national and international standards, e.g., Bibliographic Information Interchange (ANSI Z39.2) and Format for Bibliographic Information Interchange on Magnetic Tape (ISO 2709). 1.2.2. Content designation, the codes and conventions established to identify explicitly and characterize further the data elements within a record and to support the manipulation of those data, is defined in the USMARC formats. 1.2.3. The content of most data elements is defined by standards outside the formats, e.g., Anglo-American Cataloguing Rules, Library of Congress Subject Headings, National Library of Medicine Classification. The content of other data elements, e.g., coded data (see section 9. below), is defined in the USMARC formats. 1.3. A USMARC format is a set of codes and content designators defined for encoding a particular type of machine- readable record. USMARC formats are defined for the following types of data: bibliographic, holdings, and authority. 1.3.1. USMARC Format for Bibliographic Data contains format specifications for encoding data elements needed to describe, retrieve, and control various forms of bibliographic material. The USMARC Format for Bibliographic Data is an integrated format defined for the identification and description of different forms of bibliographic material. USMARC specifications are defined for books, archival and manuscripts control, computer files, maps, music, visual materials, and serials. With the full integration of the previously discrete bibliographic formats, consistent definition and usage are maintained for different forms of material. 1.3.2. USMARC Format for Holdings Data contains format specifications for encoding data elements pertinent to holdings and location data for all forms of material. 1.3.3. USMARC Format for Authority Data contains format specifications for encoding data elements that identify or control the content and content designation of those portions of a bibliographic record that may be subject to authority control. 1.4. The USMARC formats are maintained by the Library of Congress in consultation with various user communities. 1.4.1. Through maintenance and revision, content designation is added to and existing content designation is made obsolete or deleted from formats. Content designation is made obsolete when it is found to be no longer appropriate or when the data element involved is no longer needed. An obsolete content designator may continue to appear in records created prior to the date it was made obsolete. Obsolete content designators are not used in new records. A deleted content designator is one that had been reserved in USMARC but had not been defined or one that had been defined but it is known with near certainty that it had not been used. 1.4.2. The principles stated in this document have developed over time. The formats contain exceptions to the principles due to early format development decisions. While many exceptions have been made obsolete, others remain because of the need to maintain upward compatibility of the formats in current development. 2. General Considerations 2.1. The USMARC formats are communication formats, primarily designed to provide specifications for the exchange of bibliographic and related information between systems. They are widely used in a variety of exchange and processing environments. As communication formats, they do not mandate internal storage or display formats to be used by individual systems. 2.2. The USMARC formats, particularly the bibliographic and authority formats, were developed to enable the Library of Congress to communicate its catalog records to other institutions. The formats have had a close relationship to the needs and practices of United States libraries. They reflect both the various cataloging codes applied in the library community and the requirements of the archives community. 2.3. The USMARC formats were designed to facilitate the exchange of bibliographic and related information on magnetic tape within the United States. An attempt has been made to preserve compatiblity with other national and international formats, e.g., CANMARC and UNIMARC. Lack of international agreement on cataloging codes and practices has made complete compatibility impossible. 2.4. National agencies in the United States and Canada (Library of Congress, National Agricultural Library, National Library of Medicine, United States Government Printing Office, and National Library of Canada) are given special emphasis and consideration in the formats because they serve as sources of authoritative cataloging and as agencies responsible for certain data elements. 2.5. The institutions responsible for the content, content designation, and transcription accuracy of bibliographic and authority data within a USMARC record are identified at the record level in field 008/39 (Fixed-Length Data ElementsÄCataloging source) and in field 040 (Cataloging Source). This responsibility may be evaluated in terms of the following rule. 2.5.1. Responsible Parties Rule: 2.5.1.1. Unmodified recordsÄThe institution identified as the cataloging institution (field 040$a) is considered responsible for data content in the record except for agency-assigned data (see section 2.5.2.1. below). The institution identified as the transcribing institution (field 040$c) is considered responsible for content designation and transcription accuracy for all data. 2.5.1.2. Modified recordsÄInstitutions identified as cataloging or modifying institutions (field 040$a,$d) are considered collectively responsible for data content in the record except for agency- assigned and authoritative-agency data (see section 2.5.2. below). Institutions identified as transcribing or modifying institutions (field 040$c,$d) are considered collectively responsible for content designation and transcription accuracy. 2.5.2. Exceptions to Responsible Parties Rule: 2.5.2.1. Certain data elements are defined in the USMARC formats as being exclusively assigned by particular agencies, e.g., International Standard Serial Number (field 022), Library of Congress Control Number (field 010). The content of such agency- assigned elements is always the responsibility of the agency. 2.5.2.2. Certain data elements have been defined in the USMARC formats in relation to one or more authoritative agencies that maintain the lists or rules upon which the data is based, e.g., Library of Congress Call Number (field 050), National Library of Medicine Call Number (field 060). Where it is possible for other agencies to create similar or identical content for these data elements, content designation may be provided to distinguish between content actually assigned by the authoritative agency and that assigned by other agencies. In the former case, responsibility for content rests with the authoritative agency. In the latter case, the Responsible Parties Rule applies, and no further identification of the assigning agency is provided. 2.6. The USMARC bibliographic format provides content designation only for data that are applicable to all copies of the bibliographic entity described. 2.6.1. Information which applies only to some copies (or even to a single copy) of a title may be of interest beyond the institutions holding such copies. The USMARC formats provide limited content designation for the encoding of this information and for identifying the holding institution, e.g., subfield $5 in the 700-740 added entry fields in the bibliographic format. 2.6.2. Information that does not apply to all copies of a title, and is not of interest to other institutions, is coded in local fields. For instance, the 59X block is reserved for local notes in the bibliographic format (see section 6.7 below). 2.7. Although a USMARC record is usually autonomous, data elements are provided that contain information used to link related records. These linkages may be implicit, through identical access points in each record, or explicit, through a linking entry field. The 76X-78X linking entry fields in the bibliographic format may contain either selected data elements that identify the related item or a control number that identifies the related record. In addition, an explicit code in the leader identifies a record that is linked to another record through a control number. 3. Structural Features 3.1. The USMARC formats are an implementation of the Bibliographic Information Interchange (ANSI Z39.2). The formats also incorporate other relevant ANSI standards, e.g., Magnetic Tape Labels and File Structure for Information Interchange (ANSI X3.27). 3.2. All information in a USMARC record is stored in character form. USMARC communications records are coded in Extended ASCII, as defined in the USMARC Specifications for Record Structure, Character Sets, Tapes. 3.3. The length of each variable field can be determined either from the length-of-field portion of the directory entry or from the occurrence of the field terminator character [1E16, 8-bit]. The length of a record can be determined either from the logical record length element in Leader/00-04 or from the occurrence of the record terminator character [1D16, 8-bit]. The location of each variable field is explicitly stated in the starting character position element in its directory entry. 4. Content Designation 4.1. The goal of content designation is to identify and characterize the data elements that comprise a USMARC record with sufficient precision to support manipulation of the data for a variety of functions. 4.2. USMARC content designation is designed to support functions that include: a. DisplayÄthe formatting of data for display on a CRT, for printing on 3x5 cards or in book catalogs, for production of COM catalogs, or for other visual presentation of the data. b. Information retrievalÄthe identification, categorization, and retrieval of any identifiable data element in a record. 4.3. Some fields serve multiple functions. For example, field 245 (Title Statement) serves both as the bibliographic transcription of the title and the statement of responsibility and as an access point for the title. 4.4. The USMARC formats provide for display constants. A display constant is a term, phrase, and/or spacing or punctuation convention that may be system generated under prescribed circumstances to make a visual presentation of data in a record more meaningful to a user. Such display constants are not carried in the data, but may be supplied for display by the processing system. For example, subfield $x in Series Statement field 490 (and in some other fields) implies the display constant ISSN; also, the combination of tag 780 (Preceding Entry) and second indicator value 3 implies the display constant Supersedes in part:. 4.5. The USMARC formats support the sorting of data only to a limited extent. In general, sorting must be accomplished through the application of external algorithms to the data. 5. Organization of the Record 5.1. A USMARC record consists of three main sections: the leader, the directory, and the variable fields. 5.2. The leader consists of data elements that contain coded values and are identified by relative character position. Data elements in the leader define parameters for processing the record. The leader is fixed in length (24 characters) and occurs at the beginning of each USMARC record. 5.3. The directory contains the tag, starting location, and length of each field within the record. Directory entries for variable control fields appear first, in ascending tag order. Entries for variable data fields follow, arranged in ascending order according to the first character of the tag. The order of the fields in the record does not necessarily correspond to the order of directory entries. Duplicate tags are distinguished only by location of the respective fields within the record. The length of the directory entry is defined in the entry map elements in Leader/20-23. In the USMARC formats, the length of a directory entry is 12 characters. The directory ends with a field terminator character. 5.4. The data content of a record is divided into variable fields. The USMARC formats distinguish two types of variable fields: variable control fields and variable data fields. Control and data fields are distinguished only by structure (see sections 7 and 8 below). The term fixed fields is occasionally used in USMARC documentation, referring either to control fields generally or to specific coded-data fields, e.g., 007 (Physical Description Fixed Field) or 008 (Fixed-Length Data Elements). 6. Variable Fields and Tags 6.1. The data in a USMARC record is organized into fields, each identified by a three-character tag. 6.2. According to ANSI Z39.2, the tag must consist of alphabetic or numeric ASCII graphic characters, i.e., decimal integers 0-9 or letters A-Z (uppercase or lowercase, but not both). The MARC formats have used only numeric tags. 6.3. The tag is stored in the directory entry for the field, not in the field itself. 6.4. Variable fields are grouped into blocks according to the first character of the tag, which identifies the function of the data within a record, e.g., main entry, added entry, subject entry. The type of information in the field, e.g., personal name, corporate name, or title, is identified by the remainder of the tag. 6.4.1. Bibliographic format blocks: 0XX = Control information, numbers, and codes 1XX = Main entry 2XX = Titles and title paragraph (title, edition, imprint) 3XX = Physical description, etc. 4XX = Series statements 5XX = Notes 6XX = Subject access fields 7XX = Added entries other than subject or series; linking fields 8XX = Series added entries, etc. 9XX = Reserved for local implementation 6.4.2. Authority format blocks: 0XX = Control information, numbers, and codes 1XX = Heading 2XX = Complex see references 3XX = Complex see also references 4XX = See from tracings 5XX = See also from tracings 6XX = Reference notes, treatment decisions, notes, etc. 7XX = Not defined 8XX = Not defined 9XX = Reserved for local implementation 6.4.3. Holdings format blocks: 0XX = Control information, numbers, and codes 1XX = Not defined 2XX = Not defined 3XX = Not defined 4XX = Not defined 5XX = Notes 6XX = Not defined 7XX = Not defined 8XX = Holdings and location data, notes 9XX = Reserved for local implementation 6.5. Certain blocks in the USMARC bibliographic and authority formats contain data which may be subject to authority control (1XX, 4XX, 6XX, 7XX, 8XX for bibliographic records; 1XX, 4XX, 5XX for authority records). 6.5.1. In these blocks, certain parallels of content designation are preserved. The following meanings are generally given to the final two characters of the tag: X00 = Personal names X10 = Corporate names X11 = Meeting names X30 = Uniform titles X40 = Bibliographic titles X50 = Topical terms X51 = Geographic names Further content designation (indicators and subfield codes) for data elements subject to authority control are defined consistently across the bibliographic and authority formats. These guidelines apply only to the main range of fields in each block, not to secondary ranges, e.g., the linking entry fields 760-787 in the bibliographic format. 6.5.2. Within fields subject to authority control, data elements may exist which are not subject to authority control and which may vary from record to record containing the same heading, e.g., subfield $e, Relator. 6.5.3. In fields not subject to authority control, each tag is defined independently. Parallel meanings have been preserved whenever possible. 6.6. Principles have been established to assist in determining when a separate field should be defined for note data and when the data should be included in a general note field. 6.6.1. In the USMARC bibliographic format, a specific 5XX note field is defined when at least one of the following is true: a. Categorical indexing or retrieval is required on the data defined for the note. The note is used for structured access purposes but does not have the nature of a controlled access point. b. Special manipulation of that specific category of data is a routine requirement. Such manipulation includes special print/display formatting or selection/suppression from display or printed product. c. Specialized structuring of information for reasons other than those given in (a) or (b), e.g., to support particular standards of data content when they cannot be supported in existing fields. 6.6.2. In the USMARC authority format, the specifications for notes are covered in the following two conditions: a. A specific note field is needed when special manipulation of that specific category of data is a routine requirement. Such manipulation includes special print/display formatting or selection/suppression from display or printed product. b. Multiple notes are generally not established to accommodate the same type of information for different types of authorities. Notes are thus not differentiated by or limited to subject, name, or series if the same information applies to more than one type. 6.7. Certain tags have been reserved for local implementation. The USMARC formats specify no structure or meaning for local fields. Communication of local fields between systems is governed by mutual agreements on the content and content designation of the fields communicated. 6.7.1. The 9XX block is reserved for local implementation. 6.7.2. In general, any tag containing the character 9 is reserved for local implementation within the block structure (see section 6.4 above). 6.7.3. The historical development of the USMARC formats has left one exception to this general principle: field 490 (Series Statement) in the bibliographic format. There are several obsolete fields with tags containing the character 9. 6.8. Theoretically, all fields, except field 001 (Control Number) and field 005 (Date and Time of Latest Transaction), may be repeated. The nature of the data, however, often precludes repetition. For example, a bibliographic record may contain only one field 245 (Title Statement) and an authority record may contain only one 1XX heading field. The repeatability/nonrepeatability of each field is defined in the USMARC formats. 7. Variable Control Fields 7.1. The 00X fields in the USMARC formats are variable control fields. 7.2. Variable control fields consist of data and a field terminator. They contain neither indicators nor subfield codes (see sections 8.3 and 8.4 below). 7.3. Variable control fields contain either a single data element or a series of fixed-length data elements identified by relative character position. 8. Variable Data Fields 8.1. All fields except 00X are variable data fields. 8.2. Four levels of content designation are provided for variable data fields in ANSI Z39.2: a. a three-character tag, stored in the directory entry; b. indicators stored at the beginning of each variable data field, the number of indicators being reflected in Leader/10 (Indicator count); c. subfield codes preceding each data element, the length of the code being reflected in Leader/11 (Subfield code count); and d. a field terminator following the last data element in the field. 8.3. Indicators 8.3.1. Indicators contain values conveying information that interprets or supplements the data found in the field. 8.3.2. The USMARC formats specify two indicator positions at the beginning of each variable data field. 8.3.3. Indicators are defined independently for each field. Parallel meanings are preserved whenever possible. 8.3.4. Indicator values are interpreted independently; meaning is not ascribed to the two indicators taken together. 8.3.5. Indicators may be any lowercase alphabetic or numeric character or a blank (#). Numeric values are defined first. A blank (#) is used in an undefined indicator position or to mean information not provided in a defined indicator position. 8.3.6. The value 9 is reserved for local implementation. 8.4. Subfield Codes 8.4.1. Subfield codes identify data elements within a field that require (or might require) separate manipulation. 8.4.2. Subfield codes in the USMARC formats consist of two charactersÄa delimiter [1F16, 8-bit], followed by a data element identifier. A data element identifier may be any lowercase alphabetic or numeric character. 8.4.2.1. Numeric identifiers are defined for parametric data used to process the field, or coded data needed to interpret the field. (Note that not all numeric identifiers defined in the past have followed this specification.) 8.4.2.2. Alphabetic identifiers are defined for the separate elements that constitute the data content of the field. 8.4.2.3. The character 9 and the following graphic symbols are reserved for local definition as data element identifiers: ! " # $ % & ' ( ) * + ' - . / : ; < = > ? 8.4.3. Subfield codes are defined independently for each field. Parallel meanings are preserved whenever possible. 8.4.4. Subfield codes are defined for purposes of identification, not arrangement. The order of subfields is specified by content standards, e.g., cataloging rules. In some cases, however, such specifications may be incorporated in the USMARC format documentation. 8.4.5. Theoretically, all data elements may be repeated. The nature of the data, however, often precludes repetition. The repeatability/nonrepeatability of each subfield code is defined in the USMARC formats. 9. Coded Data 9.1. In addition to content designation, the USMARC formats include specifications for the content of certain data elements, particularly those that provide for the representation of data by coded values. 9.2. Coded values consist of fixed-length character strings. Individual elements within a coded-data field or subfield are identified by relative character position. 9.3. Although coded data occur most frequently in the leader, directory, and variable control fields, any field or subfield may be defined for coded-data elements. 9.4. Certain common values have been defined whenever applicable: # Undefined (element not defined) n Not applicable (element is not applicable to the item) u Unknown (record creator was unable to determine value) z Other (value other than those defined for the element) | Fill character (record creator has chosen not to provide information) Historical exceptions do occur in the formats. In particular, the blank (#) often has been defined as not applicable or has been assigned a specific meaning. STANDARDS AND OTHER DOCUMENTS RELATED TO USMARC FORMATS National and international standards: These publications are available from the American National Standards Institute, Inc., 1430 Broadway, New York, NY 10018. Bibliographic Information Interchange (ANSI Z39.2-1985) Format for Bibliographic Information Interchange on Magnetic Tape (ISO 2709-1981) Magnetic Tape Labels and File Structure for Information Interchange (ANSI X3.27-1987) USMARC standards: These publications are available from the Library of Congress, Cataloging Distribution Service, Washington, DC 20541. USMARC Concise Formats USMARC Format for Authority Data USMARC Format for Bibliographic Data USMARC Format for Holdings Data USMARC Specifications for Record Structure, Character Sets, Tapes USMARC Code List for Languages USMARC Code List for Countries USMARC Code List for Geographic Areas USMARC Code List for Relators, Sources, Descriptive Conventions