Titre du document / Document title
A formal language model for parsing SGML
Auteur(s) / Author(s)
MATZEN R. W. (1) ;
GEORGE K. M. (1) ;
HEDRICK G. E. (1) ;
Affiliation(s) du ou des auteurs / Author(s) Affiliation(s)
(1) Computer Science Department, Oklahoma State University, Stillwater, Oklahoma, ETATS-UNIS
Résumé / Abstract
The Standard Generalized Markup Language (SGML) is an international standard for document definition (ISO 8879) that was adopted in 1986 and is rapidly gaining acceptance in industry and government. It is a meta-language system for document design rather than a specific scheme for document processing; almost any kind of document can be described using SGML. Productions called element declarations are used to define arbitrary elements of documents and the context in which they can occur. A finite set of element declarations called a document type definition (DTD) defines the high-level syntax of a set of documents. DTDs are similar to context-free grammars, but the productions are more complex. The standard does not describe a formal language model for SGML, and there is little work in the literature on this topic. This article defines a formal language model for SGML; systems of finite automata from systems of regular expressions. This model is applied in two ways: a parser is constructed for DTDs, and methods are shown for automatically constructing parsers for the documents defined by a DTD. These methods for parsing SGML are new, and they include features of DTDs that have not previously been included in a static language model. The model applies directly to the syntactic constructs of SGML, and thus, the methods shown in this article have distinct advantages for parsing SGML over traditional context-free parsing methods.
Revue / Journal Title
The Journal of systems and software
ISSN
0164-1212
CODEN JSSODM
Source / Source
1997, vol. 36, n
o2, pp. 147-166 (17 ref.)
Langue / Language
Anglais
Editeur / Publisher
Elsevier, New York, NY, ETATS-UNIS
(1979)
(Revue)
Mots-clés anglais / English Keywords
;
;
;
;
;
;
;
Mots-clés français / French Keywords
;
;
;
;
;
;
;
;
Mots-clés espagnols / Spanish Keywords
;
;
;
;
;
;
;
Localisation / Location
INIST-CNRS, Cote INIST : 18071, 35400006331566.0040
Nº notice refdoc (ud4) : 2575525