Expat XML Processor

Denys Duchier



This is an interface to James Clark's expat library for parsing XML documents. Module Processor.ozf exports a class that implements a SAX-like processor which can be used e.g. as shown below. SAX events are invoked as methods. Each application should specialize these methods to do something useful. functor import Processor at 'x-ozlib://duchier/xml/expat/Processor.ozf' define ... class MyProcessor from Processor.'class' ... meth startElement(Tag Attribs) ... end ... end ... end


In order to build this package you first also need an installation of James Clark's expat library that provides header file expat.h and shared object library libexpat.so. You can download such a distribution from expat's source forge site: expat.sourceforge.net

Example Application

The distribution include an example.oz application. It can be invoked in the following way: example URL1 ... URLn where input is obtained from the given URLs (or files) as if they had been concatenated. If no url is given, standard input is used. The input is assumed to contain an XML document. The structure of this document is printed on standard output. The amount of indentation used for this purpose can be changed with the --index option: example --index=3 ... uses 3 additional indentation spaces for each level of XML nesting. The default is 2. For example, after this package has been installed, you can execute: ozengine x-ozlib://duchier/xml/expat/{example,testa.xml,testb.xml} which results in the following printout: /NONE one /NONE two /FOO/BAR two /OTHER three /OTHER three /OTHER three /OTHER four /OTHER four /NONE two /NONE three The argument files are testa.xml and testb.xml.



Module Processor.ozf exports, on feature class, a class definition for a SAX-like XML processor. SAX events are invoked as methods. It is up to each application to specialize the methods to do something useful (by default they do nothing). See above the definition of class MyProcessor. It can be instantiated as follows: P={New MyProcessor initFromFile(Path)} More generally, the init method takes one argument which is either an InputSource object or a list of specs for creating an InputSource object.

The processor object P can be used to obtain SAX events one by one using method getEvent($), or method parse can be invoked to process all SAX events in a loop until the end. See the API for a detailed list of the methods corresponding to SAX events, and see example.oz for illustrative code.


This package also contains a XML parser that is implemented by subclassing the previous processor. What this parser does is to build a representation of the XML document as a term. This representation is made available on feature root of the parser object.

Denys Duchier