C code generation

From Liberty Eiffel Wiki
Revision as of 15:57, 24 February 2009 by SEWikiImport Dmoisset (talk) (Mapping Eiffel types to C types)
Jump to: navigation, search

General aspects

Generated files

When compiling to C, SmartEiffel generates the following output:

  • One or more C files
  • A C header file, projectname.h.
  • A type-to-id mapping file (see #Mapping types to IDs section below).
  • A C compilation script (projectname.make or projectname.bat depending on the platform).

If you are using the -no_split mode, all the C code will be put inside a projectname.c file. Otherwise, code will be split in chunks (of more or less the same size) called projectnameN.c, where N is a positive integer number. The number of chunks may vary depending on the size of the project. The policy for splitting files will probably change in SE 2.4.

The compilation script will contain a list of commands to invoke the system C compiler and compile the relevant files with the proper flags (extracted from the command line, ACE file and/or plugin options). It will only include compile commands for those C files which have no associated object file, or that have changed since the previous compilation. It will also contain a final linking command (to generate the executable) when any of the relevant C flags has changed.

The SmartEiffel runtime (i.e., auxiliar routines to handle some mechanisms like garbage collection, exceptions, etc) and plugins are not linked, but embedded instead. That means that the code for those components will be embedded inside the .c/.h files. The runtime files are located in the sys/runtime/c directory of the compiler distribution. For example when compiling with the GC enabled, the file sys/runtime/c/gc_lib.c is copied verbatim into one of the generated projectN.c and the file sys/runtime/c/gc_lib.h is copied verbatim into the projectN.h.

Mapping types to IDs

When compiling to C, every type existing in the system is assigned an id, which is a unique positive integer number. This number is used in different parts of the C code instead of the actual type name. For example if the type STRING is assigned id 7, the feature is_equal of type STRING will be mapped to a C routine called r7is_equal (the 7 in the name comes from the type id).

Note that ids are assigned to types and not to classes. Most of the time there is one type per class, but there can be possibly more types per class. For example, even if ARRAY is a single class, ARRAY[INTEGER] and ARRAY[STRING] are distinct types, and as such will get different ids.

A few of the types based on library classes (all INTEGER*, REAL*, CHARACTER, BOOLEAN, POINTER, NATIVE_ARRAY[CHARACTER] and STRING) have a predefined, fixed id. For details on implementation of this fixed mapping, check the ID_PROVIDER class, specially feature make.

After compilation, the mapping of types to ids is stored in a text file called project.id. This file is a useful help to read the generated C code. It is also used by the compiler itself when recompiling; the compiler uses it to keep the mapping between different runs. Maintaining the mapping helps to get similar generated code between compiler runs, which means that fewer C files need to be recompiled.

Mapping Eiffel types to C types

Not every type needs a direct C implementation. A lot of types are never instantiated, because they are deferred or just because the system never creates an instance of them. Types which are effectively used in the system are called "live" types. For each live type with id N, a C type is created called TN. For some of the standard base types, the definition of the type is hardcoded in the compiler; for example, in your project .h file you will find:

typedef int32_t T2 ("2" is the id for INTEGER)

typedef double T5 ("5" is the id for REAL_64)

For references (which might be polymorphic), the C type T0 is defined, and references are of type T0 *.

For most other live types, a C struct is used, where fields of the struct correspond to the attributes (proper and inherited) of the type. The struct of the type is called struct SN. For example, if the type STD_OUTPUT has id 38, you will get the following code in the .h file:

typedef struct S38 T38;
/* ... */
struct S38{Tid id;T0* _filter;T2 _buffer_position;T9 _buffer;T2 _capacity;};

Note that each field has the corresponding field name on Eiffel, preceded by an underscore (to avoid possible name clashes with C keywords, for example if you have a class with some field called "static" or "int"). The type of the field has the corresponding C type if the field is expanded: you can see in the struct above that the attribute buffer_position was declared as INTEGER whose type has id 2; buffer is of type T9 because type id 9 was mapped to type NATIVE_ARRAY[CHARACTER]. If the field is not of an expanded type, it is declared as a C field with type T0 *.

Also note that there is an additional field Tid id. This field is an integer field containing the type id for this structure, in this case the field should always be set to 38. The field is used to identify the type in every case that a pointer (usually a T0 * may point to more than one type of structure). That happens not only when using polymorphism on the original source, but also when using some internal polymorphic functions existing in the stack-dump printing code and in the garbage collector. If the compiler decides that the id field is not needed (happens a lot on boost mode with no GC, but also in expanded types), the field is omitted.

Now it is easy to explain the definition of type T0:

typedef struct S0 T0;
struct S0{Tid id;};

Native arrays have a special, different implementation. The C type is defined as a a pointer to the element type. When the element type is a reference type, the mapping of that element type is a T0 * so:

  • NATIVE_ARRAY[CHARACTER] , with CHARACTER having type id 3, will be mapped to a T3 *
  • NATIVE_ARRAY[STRING] , with STRING having type id 7, will be mapped to a T0 **

For each type, there is also an initialization constant defined to set the default values for the object. The constant to initialize values of type with id N is called MN. For default values, the constant gets a hardcoded value and is defined as a macro:

#define M5 (0.0) /* 5 is the id for REAL_64 */

For generic instantiations of NATIVE_ARRAY, the default is also defined as a macro and is always NULL:

#define M9 NULL

For structures, the initial value is an extern variable defined at the .h file, with its value set at one of the .c files. The initial value sets the id field if present to the correct type id, while other fields are respectively set to their default values. for the example struct above, the initialization code is:

extern T38 M38; /* in the .h file */
 /* in the .c file */
T38 M38={38,(void*)0,0,(void*)0,0};

(explain what happens when an type has no attributes)

Mapping Eiffel features to C code

Exception handling

Recovering from exceptions

Printing tracebacks

Garbage Collection