Difference between revisions of "Parsing"
(→Where is the parser?: typo) |
(No difference)
|
Revision as of 07:17, 2 October 2006
The syntax analyser is the component which reads source SmartEiffel files and produces an AST (abstract syntax tree). This tree is a data structure which models more or less directly the text of the source file; it does not model semantic properties that are not directly on the text. For example, when parsing the class:
deferred class EXAMPLE inherit COLLECTION [ANY] feature constant: INTEGER is 42 end
The AST will contain enoguh information to know that the class is deferred, it is called "EXAMPLE", it inherits from "COLLECTION [ANY]" and it has a feature called "constant" with a type mark "INTEGER" and a value. It will not have information about possible name clashes with parent features, bad redefinitions, inherited features, inherit invariants, or semantic information like that. If you are interested on that part of the compiling process, read the Semantics section.
Where is the parser?
The parser can be found at tools/parser.e
and tools/parser/*
. The real core of Eiffel parsing is at the EIFFEL_PARSER class, although some of the basic parsing functionality (like knowing the position, parsing some of the tokens, and generating error messages) is in the PARSER parent class (which is shared by other parsers inside smarteiffel, like the config file parser, and ACE file parser).
There is exactly one instance of EIFFEL_PARSER during the execution, shared through the GLOBALS.eiffel_parser once function. It is used from the SMART_EIFFEL singleton, which keeps a dictionary of cached parsed classes (to avoid parsing a single file more than once).
The parser reads from a PARSER_BUFFER object (also a shared singleton) which reads the whole input file in memory before parsing, and provides random access to the file. The buffer has also some metadata about the file, like the whole path, and the cluster where the file was found.
Code organization for the parser
The parser is somewhat like a recursive descent parser, doing a little backtracking from time to time. It is completely coded by hand (no automatic generation, no parsing tables).
The main feature of the EIFFEL_PARSER is the analyse_class
function, which takes a class name and returns the CLASS_TEXT for the requested class; CLASS_TEXT is the root of the syntax tree.
For each production in the syntax (see Syntax diagrams), there is a feature to parse it. The name of the feature has the prefix a_
followed by the production name; for example to parse "Feature clause" there is a routine called a_feature_clause
. Most of them use other routines of the parser to parse sub-components, sometimes coupled with a state machine. The basic tokens have also routines for detecting them, for example a_manifest_string
and a_class_name
; there is no hard separation between lexer and parser.
You will note a couple of different patterns in the signature of these functions. Some of them return BOOLEAN values, with True meaning "successfully parsed and advanced read position" and False meaning "failure parsing, parser buffer status kept the same as before". Take a look for example at a_assignment_or_procedure_call
. In those cases, the resulting parsed node is stored at one of
the last_xxx
features of the EIFFEL_PARSER (for the previous example, last_instruction
is used). However, some other parsing features directly return the parsed node, for example a_assertion
. There are also some variations, like a_class_declaration
which has no return value (it just sets last_class_text
). Finally, some of these routines receive a POSITION, usually when they are part of a larger construct and need the initial position (in the source text) of the construct for error reporting, or to store in the AST.
The a_xxx
used to be alphabetically sorted inside EIFFEL_PARSER at some time in the past. Now they are not so you will have to find/grep your way around :-)
Structure of the generated AST
(to be completed...)