Difference between revisions of "Parsing"
(typos + important note) |
Hzwakenberg (talk | contribs) m |
||
(9 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
− | The syntax parser is the component which reads |
+ | The syntax parser is the component which reads Liberty Eiffel source files and produces an AST (abstract syntax tree). This tree is a data structure which models more or less directly the '''text''' of the source file; it does not model semantic properties that are not directly on the text. For example, when parsing the class: |
<pre> |
<pre> |
||
Line 7: | Line 7: | ||
feature |
feature |
||
− | constant: INTEGER |
+ | constant: INTEGER 42 |
end |
end |
||
Line 16: | Line 16: | ||
= Where is the parser? = |
= Where is the parser? = |
||
− | The parser can be found at <code>tools/parser.e</code> and <code>tools/parser/*</code>. The real core of Eiffel parsing is at the [[tool_class:EIFFEL_PARSER|<tt>EIFFEL_PARSER</tt>]] class, although some of the basic parsing functionality (like knowing the position, parsing some of the tokens, and generating error messages) is in the [[tool_class:PARSER|<tt>PARSER</tt>]] parent class (which is also the parent of other parsers inside |
+ | The parser can be found at <code>tools/parser.e</code> and <code>tools/parser/*</code>. The real core of Eiffel parsing is at the [[tool_class:EIFFEL_PARSER|<tt>EIFFEL_PARSER</tt>]] class, although some of the basic parsing functionality (like knowing the position, parsing some of the tokens, and generating error messages) is in the [[tool_class:PARSER|<tt>PARSER</tt>]] parent class (which is also the parent of other parsers inside Liberty Eiffel, like the [[tool_class:INI_PARSER|config file parser]], and [[tool_class:ACE_PARSER|ACE file parser]]). |
There is exactly one instance of [[tool_class:EIFFEL_PARSER|<tt>EIFFEL_PARSER</tt>]] during the execution, shared through the <tt>[[tool_class:GLOBALS|GLOBALS]].eiffel_parser</tt> once function. It is used by the [[tool_class:SMART_EIFFEL|<tt>SMART_EIFFEL</tt>]] singleton, which keeps a dictionary of cached parsed classes (to avoid parsing a single file more than once). |
There is exactly one instance of [[tool_class:EIFFEL_PARSER|<tt>EIFFEL_PARSER</tt>]] during the execution, shared through the <tt>[[tool_class:GLOBALS|GLOBALS]].eiffel_parser</tt> once function. It is used by the [[tool_class:SMART_EIFFEL|<tt>SMART_EIFFEL</tt>]] singleton, which keeps a dictionary of cached parsed classes (to avoid parsing a single file more than once). |
||
− | ''Note: that's not true anymore in 2.3. Now the classes are cached by [[tool_class:ACE|<tt>ACE</tt>]] and the [[tool_class:CLUSTER|clusters]]; |
+ | ''Note: that's not true anymore in 2.3. Now the classes are cached by [[tool_class:ACE|<tt>ACE</tt>]] and the [[tool_class:CLUSTER|clusters]]; that is used to be a lot smarter about classes having the same name but not in the same cluster. See the [[class loading]] algorithm. --[[User:Cadrian|Cyril]] 11:30, 13 Apr 2007 (CEST)'' |
The parser reads from a [[tool_class:PARSER_BUFFER|<tt>PARSER_BUFFER</tt>]] object (also a shared singleton) which reads the whole input file in memory before parsing, and provides random access to the file. The buffer has also some metadata about the file, like the whole path, and the cluster where the file was found. |
The parser reads from a [[tool_class:PARSER_BUFFER|<tt>PARSER_BUFFER</tt>]] object (also a shared singleton) which reads the whole input file in memory before parsing, and provides random access to the file. The buffer has also some metadata about the file, like the whole path, and the cluster where the file was found. |
Latest revision as of 22:04, 14 May 2022
The syntax parser is the component which reads Liberty Eiffel source files and produces an AST (abstract syntax tree). This tree is a data structure which models more or less directly the text of the source file; it does not model semantic properties that are not directly on the text. For example, when parsing the class:
deferred class EXAMPLE inherit COLLECTION [ANY] feature constant: INTEGER 42 end
The AST will contain enough information to know that the class is deferred, it is called "EXAMPLE", it inherits from "COLLECTION [ANY]" and it has a feature called "constant" with a type mark "INTEGER" and a value. It will not have information about possible name clashes with parent features, bad redefinitions, inherited features, inherit invariants, or semantic information like that. If you are interested in that part of the compiling process, read the Semantics section.
Where is the parser?
The parser can be found at tools/parser.e
and tools/parser/*
. The real core of Eiffel parsing is at the EIFFEL_PARSER class, although some of the basic parsing functionality (like knowing the position, parsing some of the tokens, and generating error messages) is in the PARSER parent class (which is also the parent of other parsers inside Liberty Eiffel, like the config file parser, and ACE file parser).
There is exactly one instance of EIFFEL_PARSER during the execution, shared through the GLOBALS.eiffel_parser once function. It is used by the SMART_EIFFEL singleton, which keeps a dictionary of cached parsed classes (to avoid parsing a single file more than once).
Note: that's not true anymore in 2.3. Now the classes are cached by ACE and the clusters; that is used to be a lot smarter about classes having the same name but not in the same cluster. See the class loading algorithm. --Cyril 11:30, 13 Apr 2007 (CEST)
The parser reads from a PARSER_BUFFER object (also a shared singleton) which reads the whole input file in memory before parsing, and provides random access to the file. The buffer has also some metadata about the file, like the whole path, and the cluster where the file was found.
Code organization for the parser
The parser is somewhat like a recursive descent parser, doing a little backtracking from time to time. It is completely coded by hand (no automatic generation, no parsing tables).
The main feature of the EIFFEL_PARSER is the analyse_class
function, which takes a class name and returns the CLASS_TEXT for the requested class; CLASS_TEXT is the root of the syntax tree.
For each production in the syntax (see the syntax diagrams), there is a feature to parse it. The name of the feature has the prefix a_
followed by the production name; for example to parse "Feature clause" there is a routine called a_feature_clause
. Most of them use other routines of the parser to parse sub-components, sometimes coupled with a state machine. The basic tokens have also routines for detecting them, for example a_manifest_string
and a_class_name
; there is no hard separation between lexer and parser.
You will note a couple of different patterns in the signature of these functions. Some of them return BOOLEAN values, with True meaning "successfully parsed and advanced read position" and False meaning "failure parsing, parser buffer status kept the same as before". Take a look for example at a_assignment_or_procedure_call
. In those cases, the resulting parsed node is stored at one of
the last_xxx
features of the EIFFEL_PARSER (for the previous example, last_instruction
is used). However, some other parsing features directly return the parsed node, for example a_assertion
. There are also some variations, like a_class_declaration
which has no return value (it just sets last_class_text
). Finally, some of these routines receive a POSITION, usually when they are part of a larger construct and need the initial position (in the source text) of the construct for error reporting, or to store in the AST.
The a_xxx
used to be alphabetically sorted inside EIFFEL_PARSER at some time in the past. Now they are not so you will have to find/grep your way around :-)
Structure of the generated AST
(to be completed...)