Difference between revisions of "Lib/xml"

From Liberty Eiffel Wiki
Jump to navigation Jump to search
m (2 revisions: initial import from SamrtEiffel Wiki - The Grand SmartEiffel Book)
m
 
(13 intermediate revisions by 3 users not shown)
Line 1: Line 1:
  +
[[Category: Library]]
 
== XML parsing ==
 
== XML parsing ==
   
XML is used a lot those days, including by the library itself (the [[library_class:STORABLE|storable]] [[library_class:REPOSITORY|repository]] uses it, see [[lib/storage]]).
+
XML is used a lot these days, even by the library itself (the [[library_class:STORABLE|storable]] [[library_class:REPOSITORY|repository]] uses it, see [[lib/storage]]).
   
 
=== The parser ===
 
=== The parser ===
Line 11: Line 12:
 
=== Events interface ("SAX") ===
 
=== Events interface ("SAX") ===
   
[http://www.saxproject.org/ SAX] means "Simple API for XML". SmartEiffel's API is not exactly identical; nonetheless it resembles the API found in other languages.
+
[http://www.saxproject.org/ SAX] means "Simple API for XML". Liberty Eiffel's API is not exactly identical; nonetheless it resembles the API found in other languages.
   
In that case, you have to inherit from [[library_class:XML_CALLBACKS|<tt>XML_CALLBACKS</tt>]] and implement its many deferred features.
+
In this case, you have to inherit from [[library_class:XML_CALLBACKS|<tt>XML_CALLBACKS</tt>]] and implement its many deferred features.
   
During the parsing, the [[library_class:XML_PARSER|XML parser]] calls that class to tell it it enters a node, finds some attribute, some text, and so on. Errors are also reported.
+
During the parsing, the [[library_class:XML_PARSER|XML parser]] calls that class to tell it, it enters a node, finds some attribute, some text, and so on. Errors are also reported.
   
 
=== Tree interface ("DOM") ===
 
=== Tree interface ("DOM") ===
   
[http://www.w3.org/DOM/ DOM] means "Document Object Model". SmartEiffel's DOM implementation is not endorsed by W3C but its API is equivalent.
+
[http://www.w3.org/DOM/ DOM] means "Document Object Model". Liberty Eiffel's DOM implementation is not endorsed by W3C but its API is equivalent.
   
 
To use an [[library_class:XML_TREE|XML tree]], you have to give it to the [[library_class:XML_PARSER|XML parser]]. The tree builds itself when called back by the parser's events.
 
To use an [[library_class:XML_TREE|XML tree]], you have to give it to the [[library_class:XML_PARSER|XML parser]]. The tree builds itself when called back by the parser's events.
Line 29: Line 30:
 
=== Small Comparison SAX vs. DOM ===
 
=== Small Comparison SAX vs. DOM ===
   
Using one or the other is not only a matter of taste, but it also depends on where you put your priorities. In a word, it is performance vs. simplicity.
+
Using one or the other is not only a matter of taste, but it also depends on where you put your priorities. In other words, you have to evaluate performance against simplicity.
   
 
{| cellspacing="10px" width="100%"
 
{| cellspacing="10px" width="100%"
Line 55: Line 56:
 
== Validation ==
 
== Validation ==
   
To validate an XML file, SmartEiffel provides a DTD parser. The XML parser may parse the provided DTD and validate your file.
+
To validate an XML file, Liberty Eiffel provides a DTD parser. The XML parser may parse the provided DTD and validate your file.
   
 
The parser recognizes any kind of DTD: inlined, in a file or on the network. You may also use a cache that locally provides network DTDs (see [[library_class:XML_DTD_PUBLIC_REPOSITORY|<tt>XML_DTD_PUBLIC_REPOSITORY</tt>]]).
 
The parser recognizes any kind of DTD: inlined, in a file or on the network. You may also use a cache that locally provides network DTDs (see [[library_class:XML_DTD_PUBLIC_REPOSITORY|<tt>XML_DTD_PUBLIC_REPOSITORY</tt>]]).
Line 61: Line 62:
 
== Namespaces and other advanced features ==
 
== Namespaces and other advanced features ==
   
Namespaces are not yet natively implemented in SmartEiffel, but they will be added later ('''contributions welcome!!''')
+
Namespaces are not yet natively implemented in Liberty Eiffel, but they will be added later ('''contributions welcome!!''')
   
The idea is to implement some new kind of XML_CALLBACKS that takes the semi-colon in nodes names into account.
+
The idea is to implement some new kind of XML_CALLBACK that takes the semicolon in nodes names into account.
   
 
Schema validation (therefore using namespaces) may be set by calling <tt>[[library_class:XML_CALLBACKS|XML_CALLBACKS]].set_validator</tt>.
 
Schema validation (therefore using namespaces) may be set by calling <tt>[[library_class:XML_CALLBACKS|XML_CALLBACKS]].set_validator</tt>.

Latest revision as of 16:38, 30 July 2024

XML parsing

XML is used a lot these days, even by the library itself (the storable repository uses it, see lib/storage).

The parser

To use the XML parser, you need to proceed in two steps:

  1. Connect the parser to an output stream, using the XML_PARSER.connect_to feature;
  2. Parse the document by sending parsing events to the provided events receiver, using the XML_PARSER.parse feature. Either you use a real events receiver, SAX-fashion (just inherit from XML_CALLBACKS); or you can provide an XML_TREE, DOM-fashion (this tree implements the events receiver interface and builds its nodes when receiving events from the parser).

Events interface ("SAX")

SAX means "Simple API for XML". Liberty Eiffel's API is not exactly identical; nonetheless it resembles the API found in other languages.

In this case, you have to inherit from XML_CALLBACKS and implement its many deferred features.

During the parsing, the XML parser calls that class to tell it, it enters a node, finds some attribute, some text, and so on. Errors are also reported.

Tree interface ("DOM")

DOM means "Document Object Model". Liberty Eiffel's DOM implementation is not endorsed by W3C but its API is equivalent.

To use an XML tree, you have to give it to the XML parser. The tree builds itself when called back by the parser's events.

If there were no errors, the XML tree provides a root feature that is an XML_NODE.

Errors can be managed by using the with_error_handler feature. The given agent will be called with the errors the parser finds.

Small Comparison SAX vs. DOM

Using one or the other is not only a matter of taste, but it also depends on where you put your priorities. In other words, you have to evaluate performance against simplicity.

Topic DOM SAX
Simplicity Very simple: the whole tree is built in a preliminary pass and all the nodes are available for as long as needed afterwards. Less simple: the parser provides a stream of events that have to be dealt with.

On the other hand you have more control over which events are important and what you actually do with them.

Memory Memory hungry since objects are built to represent the whole tree Not necessarily memory hungry since there is only a stream of feature calls. Of course you may need to create your own objects but that's up to you.
Performance The XML document is used in two passes:
  1. create the tree (parse the whole document)
  2. use the tree (navigate in the tree to find nodes of interest)
The XML document may be exploited in one pass since you have first-hand access to the events during the parsing.

Validation

To validate an XML file, Liberty Eiffel provides a DTD parser. The XML parser may parse the provided DTD and validate your file.

The parser recognizes any kind of DTD: inlined, in a file or on the network. You may also use a cache that locally provides network DTDs (see XML_DTD_PUBLIC_REPOSITORY).

Namespaces and other advanced features

Namespaces are not yet natively implemented in Liberty Eiffel, but they will be added later (contributions welcome!!)

The idea is to implement some new kind of XML_CALLBACK that takes the semicolon in nodes names into account.

Schema validation (therefore using namespaces) may be set by calling XML_CALLBACKS.set_validator.

Path parsing has to be implemented.