The syntax parser is the component which reads source LibertyEiffel files and produces an AST (abstract syntax tree). This tree is a data structure which models more or less directly the text of the source file; it does not model semantic properties that are not directly on the text. For example, when parsing the class:
deferred class EXAMPLE inherit COLLECTION [ANY] feature constant: INTEGER is 42 end
The AST will contain enough information to know that the class is deferred, it is called "EXAMPLE", it inherits from "COLLECTION [ANY]" and it has a feature called "constant" with a type mark "INTEGER" and a value. It will not have information about possible name clashes with parent features, bad redefinitions, inherited features, inherit invariants, or semantic information like that. If you are interested in that part of the compiling process, read the Semantics section.
Where is the parser?
The parser can be found at
tools/parser/*. The real core of Eiffel parsing is at the EIFFEL_PARSER class, although some of the basic parsing functionality (like knowing the position, parsing some of the tokens, and generating error messages) is in the PARSER parent class (which is also the parent of other parsers inside smarteiffel, like the config file parser, and ACE file parser).
There is exactly one instance of EIFFEL_PARSER during the execution, shared through the GLOBALS.eiffel_parser once function. It is used by the SMART_EIFFEL singleton, which keeps a dictionary of cached parsed classes (to avoid parsing a single file more than once).
Note: that's not true anymore in 2.3. Now the classes are cached by ACE and the clusters; that is used to be a lot smarter about classes having the same name but not in the same cluster. See the class loading algorithm. --Cyril 11:30, 13 Apr 2007 (CEST)
The parser reads from a PARSER_BUFFER object (also a shared singleton) which reads the whole input file in memory before parsing, and provides random access to the file. The buffer has also some metadata about the file, like the whole path, and the cluster where the file was found.
Code organization for the parser
The parser is somewhat like a recursive descent parser, doing a little backtracking from time to time. It is completely coded by hand (no automatic generation, no parsing tables).
For each production in the syntax (see the syntax diagrams), there is a feature to parse it. The name of the feature has the prefix
a_ followed by the production name; for example to parse "Feature clause" there is a routine called
a_feature_clause. Most of them use other routines of the parser to parse sub-components, sometimes coupled with a state machine. The basic tokens have also routines for detecting them, for example
a_class_name; there is no hard separation between lexer and parser.
You will note a couple of different patterns in the signature of these functions. Some of them return BOOLEAN values, with True meaning "successfully parsed and advanced read position" and False meaning "failure parsing, parser buffer status kept the same as before". Take a look for example at
a_assignment_or_procedure_call. In those cases, the resulting parsed node is stored at one of
last_xxx features of the EIFFEL_PARSER (for the previous example,
is used). However, some other parsing features directly return the parsed node, for example
a_assertion. There are also some variations, like
a_class_declaration which has no return value (it just sets
last_class_text). Finally, some of these routines receive a POSITION, usually when they are part of a larger construct and need the initial position (in the source text) of the construct for error reporting, or to store in the AST.
a_xxx used to be alphabetically sorted inside EIFFEL_PARSER at some time in the past. Now they are not so you will have to find/grep your way around :-)
Structure of the generated AST
(to be completed...)