C code generation

From Liberty Eiffel Wiki
Revision as of 01:01, 27 February 2009 by SEWikiImport Dmoisset (talk) (Coding source positions)
Jump to: navigation, search

General aspects

Generated files

When compiling to C, SmartEiffel generates the following output:

  • One or more C files
  • A C header file, projectname.h.
  • A type-to-id mapping file (see #Mapping types to IDs section below).
  • A C compilation script (projectname.make or projectname.bat depending on the platform).

If you are using the -no_split mode, all the C code will be put inside a projectname.c file. Otherwise, code will be split in chunks (of more or less the same size) called projectnameN.c, where N is a positive integer number. The number of chunks may vary depending on the size of the project. The policy for splitting files will probably change in SE 2.4.

The compilation script will contain a list of commands to invoke the system C compiler and compile the relevant files with the proper flags (extracted from the command line, ACE file and/or plugin options). It will only include compile commands for those C files which have no associated object file, or that have changed since the previous compilation. It will also contain a final linking command (to generate the executable) when any of the relevant C flags has changed.

The SmartEiffel runtime (i.e., auxiliar routines to handle some mechanisms like garbage collection, exceptions, etc) and plugins are not linked, but embedded instead. That means that the code for those components will be embedded inside the .c/.h files. The runtime files are located in the sys/runtime/c directory of the compiler distribution. For example when compiling with the GC enabled, the file sys/runtime/c/gc_lib.c is copied verbatim into one of the generated projectN.c and the file sys/runtime/c/gc_lib.h is copied verbatim into the projectN.h.

Mapping types to IDs

When compiling to C, every type existing in the system is assigned an id, which is a unique positive integer number. This number is used in different parts of the C code instead of the actual type name. For example if the type STRING is assigned id 7, the feature is_equal of type STRING will be mapped to a C routine called r7is_equal (the 7 in the name comes from the type id).

Note that ids are assigned to types and not to classes. Most of the time there is one type per class, but there can be possibly more types per class. For example, even if ARRAY is a single class, ARRAY[INTEGER] and ARRAY[STRING] are distinct types, and as such will get different ids.

A few of the types based on library classes (all INTEGER*, REAL*, CHARACTER, BOOLEAN, POINTER, NATIVE_ARRAY[CHARACTER] and STRING) have a predefined, fixed id. For details on implementation of this fixed mapping, check the ID_PROVIDER class, specially feature make.

After compilation, the mapping of types to ids is stored in a text file called project.id. This file is a useful help to read the generated C code. It is also used by the compiler itself when recompiling; the compiler uses it to keep the mapping between different runs. Maintaining the mapping helps to get similar generated code between compiler runs, which means that fewer C files need to be recompiled.

Mapping Eiffel types to C types

Not every type needs a direct C implementation. A lot of types are never instantiated, because they are deferred or just because the system never creates an instance of them. Types which are effectively used in the system are called "live" types. For each live type with id N, a C type is created called TN. For some of the standard base types, the definition of the type is hardcoded in the compiler; for example, in your project .h file you will find:

typedef int32_t T2 ("2" is the id for INTEGER)

typedef double T5 ("5" is the id for REAL_64)

For references (which might be polymorphic), the C type T0 is defined, and references are of type T0 *.

For most other live types, a C struct is used, where fields of the struct correspond to the attributes (proper and inherited) of the type. The struct of the type is called struct SN. For example, if the type STD_OUTPUT has id 38, you will get the following code in the .h file:

typedef struct S38 T38;
/* ... */
struct S38{Tid id;T0* _filter;T2 _buffer_position;T9 _buffer;T2 _capacity;};

Note that each field has the corresponding field name on Eiffel, preceded by an underscore (to avoid possible name clashes with C keywords, for example if you have a class with some field called "static" or "int"). The type of the field has the corresponding C type if the field is expanded: you can see in the struct above that the attribute buffer_position was declared as INTEGER whose type has id 2; buffer is of type T9 because type id 9 was mapped to type NATIVE_ARRAY[CHARACTER]. If the field is not of an expanded type, it is declared as a C field with type T0 *.

Also note that there is an additional field Tid id. This field is an integer field containing the type id for this structure, in this case the field should always be set to 38. The field is used to identify the type in every case that a pointer (usually a T0 * may point to more than one type of structure). That happens not only when using polymorphism on the original source, but also when using some internal polymorphic functions existing in the stack-dump printing code and in the garbage collector. If the compiler decides that the id field is not needed (happens a lot on boost mode with no GC, but also in expanded types), the field is omitted.

Now it is easy to explain the definition of type T0:

typedef struct S0 T0;
struct S0{Tid id;};

Native arrays have a special, different implementation. The C type is defined as a a pointer to the element type. When the element type is a reference type, the mapping of that element type is a T0 * so:

  • NATIVE_ARRAY[CHARACTER] , with CHARACTER having type id 3, will be mapped to a T3 *
  • NATIVE_ARRAY[STRING] , with STRING having type id 7, will be mapped to a T0 **

For each type, there is also an initialization constant defined to set the default values for the object. The constant to initialize values of type with id N is called MN. For default values, the constant gets a hardcoded value and is defined as a macro:

#define M5 (0.0) /* 5 is the id for REAL_64 */

For generic instantiations of NATIVE_ARRAY, the default is also defined as a macro and is always NULL:

#define M9 NULL

For structures, the initial value is an extern variable defined at the .h file, with its value set at one of the .c files. The initial value sets the id field if present to the correct type id, while other fields are respectively set to their default values. for the example struct above, the initialization code is:

extern T38 M38; /* in the .h file */
 /* in the .c file */
T38 M38={38,(void*)0,0,(void*)0,0};

(explain what happens when an type has no attributes)

Mapping Eiffel features to C code


Eiffel routines are mapped to C functions. For a routine in the type with id N, called eiffel_name, a C function called rNeiffel_name is generated. That routine:

  • Is declared as void for procedures or has a return type of the obvious mapping type of the result for functions (that is T0 * for references, or TN for expanded types).
  • Has an argument se_dump_stack *caller. This argument is used for describing the activation record of the caller routine; more details about this are given in the #Exception handling section. In boost mode, this information is not needed and this argument is removed.
  • Has an argument called C with the type of Current directly mapped. This is one of the few cases where references are not changed to T0 *, but the specific id is used even for reference types. Note that an expanded current will have a declaration of Tnn C, while a reference type will have a declaration of Tnn *C; this is because there is no possible polymorphism here. In a few cases, for routines that do not need information on the current instance, this argument is omitted.
  • Has arguments called a1, a2, ... for each of the arguments of the Eiffel routine. This arguments are mapped to C types in the usual way.

Note that each routine in the Eiffel source may be remapped as multiple C functions, one per live descendant type (and generic variation). This happens even if there is no redefinition, because C types may be different for the same piece of Eiffel code (due to anchors, generics, and the change of Current). Note that when generating code for this routines, anchored types and generics are resolved to specific types and thus require no special handling.

When there is a call and the compiler can statically decide the run time type of the call target, the call is mapped directly to one of these functions. When there is a possible polymorphic target, a "switching function" is generated. The switching function is called as XMname, where M is the type id of the static target. This function has a similar prototype to the functions described above, but with C (the argument to pass Current) declared as T0 * (note that polymorphic calls are always done on reference targets). The implementation of the function is a switch or nested ifs, which call the corresponding rP when the run-time type of the object is the one with id P (P should be the id of a type which conforms to the type with id M). The switching function may contain an argument called position in non-boost modes, with a codification of the source position of the call for error reporting purposes (see #Exception handling).

In boost mode, some simplifications may be made. Specially, some routines are inlined instead of being mapped to a C function. The switching functions may be inlined too.


As seen before, Eiffel attributes have corresponding attributes in the C structures which represent instances. Attribute access is translated to structure field access when the type of the object can be decided at compile-time. However, an Eiffel expression x.attr, when the run-time type of x is not completely decided, can not be translated directly as field access, because the compiler doesn't know at compile time how to typecast the T0 * which represents x; and in fact a cast can not be done safely, because due to inheritance (possibly multiple), the attribute can be at different offsets in the structures that represent the possible live run time types of x.

In those cases, a switching function is also generated. The branches on the switch check the live type and do the proper typecast and field access in each case.

Note that this also generalizes to the cases where a query is implemented in some subtypes as a function and in other subtypes as an attribute. In those cases, the switching function has some branches doing function calls and other branches doing field access.

Once routines

For once routines, one or more routines are generated like for other non-once routines. But also one or two global variables are generated: a flag to remember if the routine has already been called, and in the case of once functions a second one for the cached result. These two variables have in their name the id of the class where the once function is declared. By "id of a class", it actually means "the id of the type which directly represents the class" (which is always one because generic classes can not have once features declared directly on them).

The flag variable is called int fBCMname (fBC means "flag at base class"); the cached result is called "oBCMname" (once for base class), with the type mapped in the normal way. They are declared in the.h file, and initialized on one of the .c files. The routines in the live-types (which may be more than one, and their type ids of several of them will not be M), will have code like:

if (fBCmmname==0) {
  fBCmmname=1; {
   /* translation of routine body here, using oBCmmname as `Result' */
return oBCmmname; /* Only in functions */

Note that using the id of the base class and not of the live type ensures that the once results are effectively shared systemwide (once-per-system instead of once-per-live-type).

Implementing local variables and Result values

Local variables in Eiffel routines are mapped one by one to local variables in the mapped C function. The C types used is the usual mapping (T0 * for references, TN for expanded types). Locals are declared an initialized at the top of the routine. The name of the local is the Eiffel name preceded by an underscore. So, if you have local i:INTEGER; some_string: STRING, the C code for implementations of that routine will have

T2 _i=0;
T7 _some_string=(void *)0;

Additionaly, Eiffel functions get an additional local variable in the C code called R, which maps the Eiffel Result special variable. It is typed and initialized as any other local variable. Eiffel functions always get exactly one return statement in their C mapping, on their last line, saying return R;.

An exception to the above are once functions. As mentioned before, the result for that functions is a global variable. In that case, the R local is not declared, uses of Result in the Eiffel code are mapped to uses of the global variable, and the last line of the routine will be a return oBCmmname;.

Exception handling

The managed stack

Even if not required for implementing the rescue construct, SmartEiffel implements a "managed stack". This managed stack consists on extra information embedded on the execution stack, which is useful to provide debugging and backtrace information; it is also used for assertion checking. The managed stack is used in all compilation modes except for the boost mode.

In C, for every function call executed a stack frame is allocated. The stack frame contains the function arguments, the local variables, and probably some machine-dependent and C compiler dependent information (for example, a return address). The order of these elements on the stack frame may also be variable between platforms. SmartEiffel adds some local variables to each routine which hold some metadata about the structure of those stack frames, for example:

  • which routine is the one corresponding to that frame
  • where are the locals and arguments located in the stack
  • where is each frame located in the call chain

This information is stored in a local variable of every routine called ds of type se_dump_stack. A variable size part of the information is stored in another local variable called locals, which is a stack allocated array of pointers. A typical stack frame might look like this:

The image above shows heap-allocated objects in blue, stack allocated (expanded) objects in red, global runtime structures in green and stack allocated runtime info/pointers in white. The routine above uses Current in some way (because of the local variable C. It has two arguments (a1, a2) and the first one is of an expanded type (shown in red). It is a function returning some expanded type (it has an R variable), and it has two local variables, local_exp of an expanded type, and local_ref of some reference type.

The local ds variable is initialized on routine entry. It has the following fields

  • ds.fd points to the frame descriptor. The frame descriptor is a structure declared as a local static variable (so, it is globally allocated and shared) and contains information shared by all the stack frames of the same routine (a string with the Eiffel routine name and class, number of arguments, etc).
  • ds.current is initialized to &C. This allows code using the data structure to get the value of Current in this stack frame. Note that the pointer points to the location of the C variable, which may be the current object if Current' is of an expanded type, but usually will be again a pointer to the actual object. If the routine has no need for Current, this value will be NULL
  • ds.p is an integer value with an encoding of the position (file, line, column) inside the source Eiffel code which was last executed while this frame was active. See the next section for details.
  • ds.locals is initialized to &locals or to NULL in the cases where locals is not needed (see below).
  • caller points to the ds local variable of the calling function. This value is passed on the caller argument added by the compiler to each routine.

Enoding source positions

Recovering from exceptions

Printing tracebacks

Garbage Collection