SGML and MARC
"SGML"
stands for Standard Generalized Markup Language. The
standard was issued by the International Organization for
Standardization (ISO) in 1986. It allows publishers and
authors to represent documents in such a way that the text
may be separated from the structure without regard to the
particular word-, text- or media-processing system being
used. Documents which conform to the SGML format may be
exchanged and processed on many different systems in many
different ways. This represents an enormous advance over
traditional word processors (e.g., Microsoft Word,
WordPerfect, Displaywrite, etc.), which mix commands that
affect visual properties with the text or content of the
document, resulting in a "closed" environment that thwarts
one's ability to interchange and integrate documents when
they were created on various systems. Let's continue
refining our definitions....
SGML
- is a language for specification of markup languages
(like HTML, TEI, etc.)
- is a set of rules--syntax and semantics--for
designing markup languages
- is a system of "metalanguage" constructs
- is a kit for constructing text-description (markup)
languages
- is flexible enough to define an infinite number of
markup languages (one for memos, one for
books, one for hypermedia, etc.)
DTD
- means Document Type Definition
- is an SGML application (i.e., is written "in SGML")
- defines the structure of a particular type of
document
- more precisely, a DTD defines a specific set of tags
and markup commands (element names, whether they are
repeatable, what order they should be in, what kinds of
markup can be omitted, the contents of elements, tag
attributes and their default values, names of permissible
entities, and typewriter conventions that may be used).
- may be stored externally to the document
A Document instance
is the document itself which:
- contains data (contents)
- contains markup
- includes a reference to the DTD (if it is external
to the document)
HTML
- HyperText Markup Language
- is an SGML-conforming markup language
- is the initialism often displayed as the DTD
declaration in the markup at the top of a hypermedia
document
- allows different Web browsers to interpret the
appearance of a single document differently (because, since
HTM language conforms to SGML, the structure commands are
indicated separately from the text, data, graphics, etc.).
United States Patent and Trademark Office Home Page
(URL=http://www.uspto.gov/)
- is an example of a document instance
- is written "in HTML"
- when displayed in a Web browser's "source" view, shows
as the DTD declaration at the top of the document.
MARC
- stands for MAchine Readable Cataloging
- is a language that does not conform to SGML
- is a language which allows library [archives,
museum, etc.] information to be stored, shared, and
manipulated by computer in a consistent manner, regardless
of the variations in resulting appearances of the data in
different online systems.
- has various applications, or formats (e.g., formats
for bibliographic data, for authorities data, for holdings
and locations, for classification data--there's even a
format for community information!)
- each format defines a specific set of fixed-length
and variable-length fields, and tags, indicators, and
subfield codes useful for processing the text/content of
different elements (e.g., title) and attributes or
properties (e.g., language, physical form, target audience)
of the entities described in bibliographic and related
records.
- USMARC is a widely-accepted standard which itself
is based on ANSI Z39.2, American National Standard for
Bibliographic Information Interchange, first promulgated in
1971, and revised in 1985. (Note the words "information
interchange" in the standard's title, and consider how this
parallels the objectives of SGML.)
- was developed originally to support library
automation, whereas SGML was developed to enhance automation
in the publishing/authorship industries.
Ms Stone wishes to acknowledge the following
people or sources for contributing to this introduction:
Thomas Bruce (Cornell Law School) and Chris Corcos and Judy
Kaul (both at Case Western Reserve University Law Library),
who shared some handouts on HTML and Web publishing; Eric
van Herwijnen, who authored Practical SGML
2nd ed. (Kluwer
Academic, 1994); Walt Crawford, the author of MARC for
Library Use 2nd ed. (G.K. Hall, 1989); and the following
readers of the newsgroup "comp.text.sgml" who offered
guidance on why it is incorrect to call HTML a subset of
SGML: Ray Dassen, Dept. of Mathematics & Computer Science
at Leiden University, The Netherlands; C.M. Sperberg-
McQueen, ACH/ACL/ALLC Text Coding Initiative at University
of Illinois at Chicago; and David Durand, Boston University
Computer Science.
Any errors or misstatements in the text above are
entirely the responsibility of Ms Stone, however, and do not
necessarily reflect the opinions of other scientists or
authors.
Back to Law Library Cataloging
Department
Send comments or suggestions
to: Alva T. Stone