Difference between revisions of "Lib/xml"
Hzwakenberg (talk | contribs) m (type) |
Hzwakenberg (talk | contribs) m |
||
(3 intermediate revisions by 2 users not shown) | |||
Line 2: | Line 2: | ||
== XML parsing == |
== XML parsing == |
||
− | XML is used a lot these days, |
+ | XML is used a lot these days, even by the library itself (the [[library_class:STORABLE|storable]] [[library_class:REPOSITORY|repository]] uses it, see [[lib/storage]]). |
=== The parser === |
=== The parser === |
||
Line 12: | Line 12: | ||
=== Events interface ("SAX") === |
=== Events interface ("SAX") === |
||
− | [http://www.saxproject.org/ SAX] means "Simple API for XML". |
+ | [http://www.saxproject.org/ SAX] means "Simple API for XML". Liberty Eiffel's API is not exactly identical; nonetheless it resembles the API found in other languages. |
In this case, you have to inherit from [[library_class:XML_CALLBACKS|<tt>XML_CALLBACKS</tt>]] and implement its many deferred features. |
In this case, you have to inherit from [[library_class:XML_CALLBACKS|<tt>XML_CALLBACKS</tt>]] and implement its many deferred features. |
||
− | During the parsing, the [[library_class:XML_PARSER|XML parser]] calls that class to tell it it enters a node, finds some attribute, some text, and so on. Errors are also reported. |
+ | During the parsing, the [[library_class:XML_PARSER|XML parser]] calls that class to tell it, it enters a node, finds some attribute, some text, and so on. Errors are also reported. |
=== Tree interface ("DOM") === |
=== Tree interface ("DOM") === |
||
− | [http://www.w3.org/DOM/ DOM] means "Document Object Model". |
+ | [http://www.w3.org/DOM/ DOM] means "Document Object Model". Liberty Eiffel's DOM implementation is not endorsed by W3C but its API is equivalent. |
To use an [[library_class:XML_TREE|XML tree]], you have to give it to the [[library_class:XML_PARSER|XML parser]]. The tree builds itself when called back by the parser's events. |
To use an [[library_class:XML_TREE|XML tree]], you have to give it to the [[library_class:XML_PARSER|XML parser]]. The tree builds itself when called back by the parser's events. |
||
Line 56: | Line 56: | ||
== Validation == |
== Validation == |
||
− | To validate an XML file, |
+ | To validate an XML file, Liberty Eiffel provides a DTD parser. The XML parser may parse the provided DTD and validate your file. |
The parser recognizes any kind of DTD: inlined, in a file or on the network. You may also use a cache that locally provides network DTDs (see [[library_class:XML_DTD_PUBLIC_REPOSITORY|<tt>XML_DTD_PUBLIC_REPOSITORY</tt>]]). |
The parser recognizes any kind of DTD: inlined, in a file or on the network. You may also use a cache that locally provides network DTDs (see [[library_class:XML_DTD_PUBLIC_REPOSITORY|<tt>XML_DTD_PUBLIC_REPOSITORY</tt>]]). |
||
Line 62: | Line 62: | ||
== Namespaces and other advanced features == |
== Namespaces and other advanced features == |
||
− | Namespaces are not yet natively implemented in |
+ | Namespaces are not yet natively implemented in Liberty Eiffel, but they will be added later ('''contributions welcome!!''') |
− | The idea is to implement some new kind of XML_CALLBACK that takes the |
+ | The idea is to implement some new kind of XML_CALLBACK that takes the semicolon in nodes names into account. |
Schema validation (therefore using namespaces) may be set by calling <tt>[[library_class:XML_CALLBACKS|XML_CALLBACKS]].set_validator</tt>. |
Schema validation (therefore using namespaces) may be set by calling <tt>[[library_class:XML_CALLBACKS|XML_CALLBACKS]].set_validator</tt>. |
Latest revision as of 16:38, 30 July 2024
XML parsing
XML is used a lot these days, even by the library itself (the storable repository uses it, see lib/storage).
The parser
To use the XML parser, you need to proceed in two steps:
- Connect the parser to an output stream, using the XML_PARSER.connect_to feature;
- Parse the document by sending parsing events to the provided events receiver, using the XML_PARSER.parse feature. Either you use a real events receiver, SAX-fashion (just inherit from XML_CALLBACKS); or you can provide an XML_TREE, DOM-fashion (this tree implements the events receiver interface and builds its nodes when receiving events from the parser).
Events interface ("SAX")
SAX means "Simple API for XML". Liberty Eiffel's API is not exactly identical; nonetheless it resembles the API found in other languages.
In this case, you have to inherit from XML_CALLBACKS and implement its many deferred features.
During the parsing, the XML parser calls that class to tell it, it enters a node, finds some attribute, some text, and so on. Errors are also reported.
Tree interface ("DOM")
DOM means "Document Object Model". Liberty Eiffel's DOM implementation is not endorsed by W3C but its API is equivalent.
To use an XML tree, you have to give it to the XML parser. The tree builds itself when called back by the parser's events.
If there were no errors, the XML tree provides a root feature that is an XML_NODE.
Errors can be managed by using the with_error_handler feature. The given agent will be called with the errors the parser finds.
Small Comparison SAX vs. DOM
Using one or the other is not only a matter of taste, but it also depends on where you put your priorities. In other words, you have to evaluate performance against simplicity.
Topic | DOM | SAX |
Simplicity | Very simple: the whole tree is built in a preliminary pass and all the nodes are available for as long as needed afterwards. | Less simple: the parser provides a stream of events that have to be dealt with.
On the other hand you have more control over which events are important and what you actually do with them. |
Memory | Memory hungry since objects are built to represent the whole tree | Not necessarily memory hungry since there is only a stream of feature calls. Of course you may need to create your own objects but that's up to you. |
Performance | The XML document is used in two passes:
|
The XML document may be exploited in one pass since you have first-hand access to the events during the parsing. |
Validation
To validate an XML file, Liberty Eiffel provides a DTD parser. The XML parser may parse the provided DTD and validate your file.
The parser recognizes any kind of DTD: inlined, in a file or on the network. You may also use a cache that locally provides network DTDs (see XML_DTD_PUBLIC_REPOSITORY).
Namespaces and other advanced features
Namespaces are not yet natively implemented in Liberty Eiffel, but they will be added later (contributions welcome!!)
The idea is to implement some new kind of XML_CALLBACK that takes the semicolon in nodes names into account.
Schema validation (therefore using namespaces) may be set by calling XML_CALLBACKS.set_validator.
Path parsing has to be implemented.