Titre du document / Document title
A formal language model for parsing SGML
Auteur(s) / Author(s)
MATZEN R. W.
(1) ;
GEORGE K. M.
(1) ;
HEDRICK G. E.
(1) ;
Affiliation(s) du ou des auteurs / Author(s) Affiliation(s)
(1) Computer Science Department, Oklahoma State University, Stillwater, Oklahoma, ETATS-UNIS
Résumé / Abstract
The Standard Generalized Markup Language (SGML) is an international standard for document definition (ISO 8879) that was adopted in 1986 and is rapidly gaining acceptance in industry and government. It is a meta-language system for document design rather than a specific scheme for document processing; almost any kind of document can be described using SGML. Productions called element declarations are used to define arbitrary elements of documents and the context in which they can occur. A finite set of element declarations called a document type definition (DTD) defines the high-level syntax of a set of documents. DTDs are similar to context-free grammars, but the productions are more complex. The standard does not describe a formal language model for SGML, and there is little work in the literature on this topic. This article defines a formal language model for SGML; systems of finite automata from systems of regular expressions. This model is applied in two ways: a parser is constructed for DTDs, and methods are shown for automatically constructing parsers for the documents defined by a DTD. These methods for parsing SGML are new, and they include features of DTDs that have not previously been included in a static language model. The model applies directly to the syntactic constructs of SGML, and thus, the methods shown in this article have distinct advantages for parsing SGML over traditional context-free parsing methods.
Revue / Journal Title
The Journal of systems and software
ISSN 0164-1212
CODEN JSSODM
Source / Source
1997, vol. 36, n
o2, pp. 147-166 (17 ref.)
Langue / Language
Anglais
Editeur / Publisher
Elsevier, New York, NY, ETATS-UNIS
(1979)
(Revue)
Mots-clés anglais / English Keywords
Software engineering ;
Programming language ;
Formal language ;
Finite automaton ;
Grammar ;
Information system ;
Document processing ;
Mots-clés français / French Keywords
Génie logiciel ;
Langage programmation ;
Langage formel ;
Automate fini ;
Grammaire ;
Système information ;
Traitement document ;
Langage SGML ;
Mots-clés espagnols / Spanish Keywords
Ingeniería logiciel ;
Lenguaje programación ;
Lenguaje formal ;
Autómata estado finito ;
Gramática ;
Sistema información ;
Tratamiento documento ;
Localisation / Location
INIST-CNRS, Cote INIST : 18071, 35400006331566.0040
Nº notice refdoc (ud4) : 2575525