Lib/xml

From Liberty Eiffel Wiki
Jump to: navigation, search

XML parsing

XML is used a lot these days, even by the library itself (the storable repository uses it, see lib/storage).

The parser

To use the XML parser, you need to proceed in two steps:

  1. Connect the parser to an output stream, using the XML_PARSER.connect_to feature;
  2. Parse the document by sending parsing events to the provided events receiver, using the XML_PARSER.parse feature. Either you use a real events receiver, SAX-fashion (just inherit from XML_CALLBACKS); or you can provide an XML_TREE, DOM-fashion (this tree implements the events receiver interface and builds its nodes when receiving events from the parser).

Events interface ("SAX")

SAX means "Simple API for XML". Liberty Eiffel's API is not exactly identical; nonetheless it resembles the API found in other languages.

In this case, you have to inherit from XML_CALLBACKS and implement its many deferred features.

During the parsing, the XML parser calls that class to tell it it enters a node, finds some attribute, some text, and so on. Errors are also reported.

Tree interface ("DOM")

DOM means "Document Object Model". Liberty Eiffel's DOM implementation is not endorsed by W3C but its API is equivalent.

To use an XML tree, you have to give it to the XML parser. The tree builds itself when called back by the parser's events.

If there were no errors, the XML tree provides a root feature that is an XML_NODE.

Errors can be managed by using the with_error_handler feature. The given agent will be called with the errors the parser finds.

Small Comparison SAX vs. DOM

Using one or the other is not only a matter of taste, but it also depends on where you put your priorities. In other words, you have to evaluate performance against simplicity.

Topic DOM SAX
Simplicity Very simple: the whole tree is built in a preliminary pass and all the nodes are available for as long as needed afterwards. Less simple: the parser provides a stream of events that have to be dealt with.

On the other hand you have more control over which events are important and what you actually do with them.

Memory Memory hungry since objects are built to represent the whole tree Not necessarily memory hungry since there is only a stream of feature calls. Of course you may need to create your own objects but that's up to you.
Performance The XML document is used in two passes:
  1. create the tree (parse the whole document)
  2. use the tree (navigate in the tree to find nodes of interest)
The XML document may be exploited in one pass since you have first-hand access to the events during the parsing.

Validation

To validate an XML file, Liberty Eiffel provides a DTD parser. The XML parser may parse the provided DTD and validate your file.

The parser recognizes any kind of DTD: inlined, in a file or on the network. You may also use a cache that locally provides network DTDs (see XML_DTD_PUBLIC_REPOSITORY).

Namespaces and other advanced features

Namespaces are not yet natively implemented in Liberty Eiffel, but they will be added later (contributions welcome!!)

The idea is to implement some new kind of XML_CALLBACK that takes the semi-colon in nodes names into account.

Schema validation (therefore using namespaces) may be set by calling XML_CALLBACKS.set_validator.

Path parsing has to be implemented.