Lib/xml
XML parsing
XML is used a lot those days, including by the library itself (the storable repository uses it, see lib/storage).
The parser
To use the XML parser, you need to proceed in two steps:
- Connect the parser to an output stream, using the XML_PARSER.connect_to feature;
- Parse the document by sending parsing events to the provided events receiver, using the XML_PARSER.parse feature. Either you use a real events receiver, SAX-fashion (just inherit from XML_CALLBACKS); or you can provide an XML_TREE, DOM-fashion (this tree implements the events receiver interface and builds its nodes when receiving events from the parser).
Events interface ("SAX")
SAX means "Simple API for XML". SmartEiffel's API is not approved by W3C; nonetheless it resembles the API found in other languages.
In that case, you have to inherit from XML_CALLBACKS and implement its many deferred features.
During the parsing, the XML parser calls that class to tell it it enters a node, finds some attribute, some text, and so on. Errors are also reported.
Tree interface ("DOM")
DOM means "Document Object Model". As for SAX, SmartEiffel's DOM implementation is not endorsed by W3C.
To use an XML tree, you have to give it to the XML parser. The tree builds itself when called back by the parser's events.
If there were no errors, the XML tree provides a root feature that is an XML_NODE.
Errors can be managed by using the with_error_handler feature. The given agent will be called with the errors the parser finds.
Small Comparison SAX vs. DOM
Using one or the other is not only a matter of taste, but it also depends on where you put your priorities. In a word, it is performance vs. simplicity.
Topic | DOM | SAX |
Simplicity | Very simple: the whole tree is built in a preliminary pass and all the nodes are available for as long as needed afterwards. | Less simple: the parser provides a stream of events that have to be dealt with.
On the other hand you have more control over which events are important and what you actually do with them. |
Memory | Memory hungry since objects are built to represent the whole tree | Not necessarily memory hungry since there is only a stream of feature calls. Of course you may need to create your own objects but that's up to you. |
Performance | The XML document is used in two passes:
|
The XML document may be exploited in one pass since you have first-hand access to the events during the parsing. |
Validation
To validate an XML file, SmartEiffel provides a DTD parser. The XML parser may parse the provided DTD and validate your file.
The parser recognizes any kind of DTD: inlined, in a file or on the network. You may also use a cache that locally provides network DTDs (see XML_DTD_PUBLIC_REPOSITORY).
Namespaces and other advanced features
Namespaces are not yet natively implemented in SmartEiffel, but they will be added later (contributions welcome!!)
The idea is to implement some new kind of XML_CALLBACKS that takes the semi-colon in nodes names into account.
Schema validation (therefore using namespaces) may be set by calling XML_CALLBACKS.set_validator.
Path parsing has to be implemented.