Difference between revisions of "Generated c code"

From Liberty Eiffel Wiki
Jump to navigation Jump to search
 
(English translation half done)
Line 2: Line 2:
 
{{TranslationWanted}}
 
{{TranslationWanted}}
 
*** You can also use [http://SmartEiffel.loria.fr/man/c_code.html the original article]
 
*** You can also use [http://SmartEiffel.loria.fr/man/c_code.html the original article]
  +
  +
People who want to interface with applications and/or libraries written in C should as far as possible limit themselves to the interfaces provided by [[Cecil]] and [[externals|external]]. This page gives details about the generated C code, but these details ought not to be of any use to you. ;-)
  +
  +
== The type identifiers ==
  +
  +
=== Description ===
  +
  +
SmartEiffel generates one unique identifier for each active type in the Eiffel code<sup>[[Generated c code#note1|1]]</sup>. A lot of symbols in the generated C code depend on that identifier.
  +
  +
'''Don't depend on those identifiers!''' The mangling table is only valid for one specific compilation of one specific application with one specific compiler version, for one compiler [[compile_to_c]] in one specific version and libraries in one specific version... We do not guarantee the stabbility of these identifiers.
  +
  +
If ''27'' is an identifier, then:
  +
  +
* The C type of an Eiffel object is T27;
  +
* The corresponding C structure is '''struct&nbsp;S27'''; in that structure, the names of the attributes are prefixed with an underscore (there may be some other fields used by SmartEiffel, in particular, the <TT>id</TT> field, which in this case has the value 27). <br> Each reference type can be cast to <TT>T0</TT> (although some reference types may not have the <TT>id</TT> field).
  +
* Each method is called '''r27''methodname''()'''
  +
* Each prefix method is called '''r27_px_''methodname''()'''
  +
* Each infix method is called '''r27_ix_''nom_de_méthode''()'''
  +
* Each late-binding method is called '''X27''nom_de_méthode''()'''
  +
* The object's creation method (when the [[Garbage collector|garbage collector]] is used) is called '''new27()'''
  +
* The type model fis a variable called '''M27'''. Models are used for initialisation. For example:
  +
T27 M27 = {27,NULL,NULL,0,0};
  +
...
  +
{T27*n=((T27*)se_malloc(sizeof(*n)));
  +
*n=M27;
  +
r27make(n);
  +
...
  +
  +
Some characters in method names such as <TT>"&lt;"</TT> or <TT>"+"</TT> will be replaced by their corresponding ASCII code in decimal.
  +
  +
=== An example ===
  +
  +
For example: [[library_class:STRING|<TT>STRING</TT>]] has the identifier '''7'''<sup>[[Generated c code#note2|2]]</sup>. So:
  +
  +
* The object type is '''T7'''.
  +
typedef struct S7 T7;
  +
* The structure is defined in '''struct&nbsp;:S7'''.
  +
struct S7{Tid id;T9 _storage;t2 _count;t2 _capacity;};
  +
* The <TT>append</TT> method becomes:
  +
void r7append(se_dump_stack*caller,T7* C,T0* a1)
  +
(See below for details on the ''dump stack'' execution stack.)
  +
  +
=== The <TT>.id</TT> file ===
  +
  +
When the application is compiled, the list of identifiers is stored in a file whose name is suffixed <TT>.id</TT>. The file is reread in incremental compilations (which allows some stability in the identifiers, so long as the whole project is not recompiled).
  +
  +
This file is structured in ''entries'', each entry being separated from others by a hash character (<TT>#</TT>). This file looks like this:
  +
  +
4 "REAL"
  +
#
  +
12 "HELLO_WORLD"
  +
class_path: "./hello_world.e"
  +
class_name: HELLO_WORLD
  +
parent_count: 0
  +
c-type: T12 reference: yes
  +
ref-status: live id-field: yes
  +
run-time-set-count: 1
  +
run-time-set:
  +
HELLO_WORLD (12)
  +
#
  +
9 "NATIVE_ARRAY[CHARACTER]"
  +
class-path: "/lib/se/lib/storage/low_level/native_array.e"
  +
class-name: NATIVE_ARRAY
  +
parent-count: 1 parents: 26
  +
c-type: T9 reference: no
  +
#
  +
21 "COMPARABLE"
  +
class-path: "/lib/se/lib/abilities/comparable.e"
  +
class-name: COMPARABLE
  +
parent-count: 1 parents: 13
  +
#
  +
. . .
  +
  +
You should never depend on these identifiers. Indeed, when an identifier is computed, collisions may occur, and affect the process. Thus, the identifier and name of each type depends not only on the type name, but also on the order in which the types are compiled. That is, on the order of ''application'' and ''library'' types combined... They also depend on the compilation mode used (since that can change the list of active types), and the version of the compiler you're using. So what is <TT>T145</TT> today may be <TT>T234</TT> tomorrow<sup>[[Generated c code#note3|3]]</sup>!
  +
  +
Consequently, '''do not ever rely on the generated identifiers, because they are not constant!''' Do not try to write in your own C code horrible things like <TT>new123</TT> or <TT>T456</TT>, because the only thing we can guarantee is that this code will not work.
  +
  +
== Naming convention ==
  +
  +
The preceding section has explained how methods are generated.
  +
  +
The function prototype <TT>r7append()</TT> from the example above is presented as
  +
v7append(se_dump_stack*caller,T7* C,T0* a1);
  +
This shows how ''Current'' and the arguments are passed. The rules are as follows:
  +
  +
* ''Current'' is called <TT>C</TT> and is always strongly types (with its own exact type). This parameter may be omitted if Current is not used by the method. In some cases (when code is inlined), ''Current'' can be copied in local variables named <TT>C1</TT>, <TT>C2</TT>...
  +
* The arguments are called <TT>a1</TT>, <TT>a2</TT>... and are typed <TT>T0*</TT> for reference types or given their exact type for expanded types (e.g. <TT>T2</TT> for an integer)
  +
* Inside functions, a local variable <TT>R</TT> is defined. This is ''Result''.
  +
* Local variable keep their Eiffel name, prefixed by an underscore.
  +
  +
== The mangling table ==
  +
  +
OK, now you understand why you cannot use type numbers, but you still want to know what those fields in the mangling table mean (in the <TT>.id</TT> file)>..
  +
  +
First, a big caveat. Although it may have been very stable for quite some time now, '''the mangling table coding may change'''! We currently have no plans to change it, and we prefer keeping it the way it is. But once again, we do not commit ourselves to the current representation.
  +
  +
Let's look again at the extract of a <TT>.id</TT> file. The part shown covers nearly all the possible cases:
  +
  +
4 "REAL"
  +
#
  +
12 "HELLO_WORLD"
  +
class_path: "./hello_world.e"
  +
class_name: HELLO_WORLD
  +
parent_count: 0
  +
c-type: T12 reference: yes
  +
ref-status: live id-field: yes
  +
run-time-set-count: 1
  +
run-time-set:
  +
HELLO_WORLD (12)
  +
#
  +
9 "NATIVE_ARRAY[CHARACTER]"
  +
class-path: "/lib/se/lib/storage/low_level/native_array.e"
  +
class-name: NATIVE_ARRAY
  +
parent-count: 1 parents: 26
  +
c-type: T9 reference: no
  +
#
  +
21 "COMPARABLE"
  +
class-path: "/lib/se/lib/abilities/comparable.e"
  +
class-name: COMPARABLE
  +
parent-count: 1 parents: 13
  +
#
  +
. . .
  +
  +
There is one entry per type (active or not); each entry spans many lines and is terminated by a hash symbol (<TT>#</TT>).
  +
  +
Each entry contains a lot of information. Not all of it is always present; missing entries take default values.
  +
  +
Only the first line is compulsory. It contains the type identifier, and its name (as would be returned by <TT>generating_type</TT>).
  +
  +
The following lines contain different fields, marked by a keyword, a colon and a value. There may be one or more fields on a single line. Those fields are:
  +
  +
{|
  +
|-
  +
| '''class-path'''
  +
| The path to the file containing the source code. May be omitted if the class has no associated file (uncommon).
  +
|-
  +
| '''class-name'''
  +
| The name of the class, as returned by <TT>generator</TT>.
  +
|-
  +
| '''parent-count'''
  +
| Le number of parents.
  +
|-
  +
| '''parents'''
  +
| On the same line as '''parent-count''' if the latter is not null; it gives the list of parent class identifiers.
  +
|-
  +
| '''c-type'''
  +
| The C type, usually in the form '''T27'''. If it is omitted, the class has no runnable type.
  +
In that case, the following fields do not appear either.
  +
|-
  +
| '''reference'''
  +
| On the same line as '''c-type''', <TT>yes</TT> for a reference type or <TT>no</TT> for an expanded type.
  +
|-
  +
| '''ref-status'''
  +
| Either <TT>live</TT> for an active type (i.e. instances of this type are created at run-time), or <TT>dead</TT> otherwise.
  +
|-
  +
| '''id-field'''
  +
| On the same line as '''ref-status''', <TT>yes</TT> if the <TT>id</TT> field has to be generated in the C structure (as its first element), <TT>no</TT> otherwise. This field is present if one of these confitions is true:
  +
* some late binding may occur on targets of that type,
  +
* or the structure may be accessed by an [[external]] or by [[cecil]].
  +
Note that a lot of calls are statically computed; the type inference algorithm used in SmartEiffel increases the number of such types that do not need the id field.
  +
|-
  +
| '''run-time-set-count'''
  +
| The number of concrete, active descendants of the type (including itself). This is the number of items in '''run-time-set''' below.
  +
|-
  +
| '''run-time-set'''
  +
| The concrete, active heirs of this type (including itself). One class per line, tab-indented.
  +
|}
  +
  +
== The dump stack ==
  +
  +
'''When not in boost mode''', a stack is managed by the runtime environment generated by SmartEiffel. This stack is displayed when an uncaught exception is raised. It is also used by the debugger [[sedb]].
  +
  +
Technically, the SmartEiffel stack is built upon the native (C) stack. Each stack element is a <TT>se_dump_stack</TT><sup>[[Generated c code#note4|4]]</sup> usually allocated on the stack. It is made up of several parts:
  +
  +
* a frame descriptor, of type <TT>se_frame_descriptor4</TT><sup>[[Generated c code#note4|4]]</sup> which is a static description of the feature: run type, does it use Current, number and type of the locals (parameters and local variables), and an anti-recursion flag (when testing contracts);
  +
* une partie statique&nbsp;: le descripteur de cadre <!-- translation: frame descriptor -->, de type <TT>se_frame_descriptor</TT><sup>[[Generated c code#note4|4]]</sup> qui décrit statiquement la ''feature''&nbsp;: type de ''Current'', nombre et type des variables locales, type de ''Result'', flag anti-récursion (pour les contrats)...
  +
* une partie dynamique&nbsp;:
  +
** un pointeur vers ''Current'' (i.e. soit un pointer sur l'objet expansé, ou bien un pointer sur la référence qui est elle-même un pointeur sur l'objet). C'est pourquoi le type du champ <TT>current</TT> est <TT>void**</TT>. Ce champ peut être <TT>NULL</TT> si ''Current'' n'est pas utilisé par la ''feature'',
  +
** la position (utilisée principalement par [[sedb]]),
  +
** un pointeur vers l'appelant (i.e. le <TT>se_dump_stack</TT> de l'appelant),
  +
** un pointeur vers l'origine de l'exception&nbsp;: si non <TT>NULL</TT>, cela signifie que <TT>se_dump_stack</TT> ''n'est pas'' sur la pile, mais a été alloué dans le tas pour préserver la trace de l'exception,
  +
** un tableau de variables locales (avec la double indirection pour ''Current''), d'où le type <TT>void***</TT>.
  +
  +
Des macros gèrent la liaison entre les les <TT>se_dump_stack</TT>.
  +
  +
* Habituellement, le sommet de la pile est la variable globale <TT>se_dst</TT> définie dans <TT>SmartEiffel/sys/runtime/c/no_check.c</TT>. La macro <TT>set_dump_stack_top</TT> réalise l'affectation de son argument dans cette variable.
  +
* Dans le cas de [[SCOOP]], chaque processeur a sa propre pile. La macro <TT>set_dump_stack_top</TT> a alors deux arguments&nbsp;: le processeur et le nouveau sommet.
  +
  +
  +
----
  +
<div id="note1">1. Il y a une bijection entre l'identificateur et le ''nom'' du type et la valeur de ses paramètres génériques (pour les classes génériques). </div>
  +
  +
<div id="note2">2. Certains identifiants sont réservés pour les types "basiques". Ce sont:
  +
  +
* '''0'''&nbsp;: le type vers lequel on peut caster n'importe quel autre
  +
* '''1'''&nbsp;: [[library_class:INTEGER_8|<TT>INTEGER_8</TT>]]
  +
* '''2'''&nbsp;: [[library_class:INTEGER|<TT>INTEGER</TT>]] (ou [[library_class:INTEGER_32|<TT>INTEGER_32</TT>]])
  +
* '''3'''&nbsp;: [[library_class:CHARACTER|<TT>CHARACTER</TT>]]
  +
* '''4'''&nbsp;: [[library_class:REAL|<TT>REAL</TT>]]
  +
* '''5'''&nbsp;: [[library_class:DOUBLE|<TT>DOUBLE</TT>]]
  +
* '''6'''&nbsp;: [[library_class:BOOLEAN|<TT>BOOLEAN</TT>]]
  +
* '''7'''&nbsp;: [[library_class:STRING|<TT>STRING</TT>]]
  +
* '''8'''&nbsp;: [[library_class:POINTER|<TT>POINTER</TT>]]
  +
* '''9'''&nbsp;: [[library_class:NATIVE_ARRAY|<TT>NATIVE_ARRAY</TT>]]<TT>[</TT>[[library_class:CHARACTER|<TT>CHARACTER</TT>]]<TT>]</TT>
  +
* '''10'''&nbsp;: [[library_class:INTEGER_16|<TT>INTEGER_16</TT>]]
  +
* '''11'''&nbsp;: [[library_class:INTEGER_64|<TT>INTEGER_64</TT>]]
  +
  +
Ainsi, il est probable que le type de l'objet racine ait l'identificateur '''12'''. Mais ne comptez pas trop dessus (avant la révolution des entiers, c'était '''11'''...)
  +
</div>
  +
  +
<div id="note3">3. Le compilateur ne s'amuse pas à changer les identificateurs sans raison. Le fichier <TT>.id</TT> est chargé au début, puis sauvegardé à la fin. Mais [[clean]], par exemple, efface ce fichier.</div>
  +
  +
<div id="note4">4. Vous pouvez trouver une définition de ces structures dans le fichier <TT>SmartEiffel/sys/runtime/c/no_check.h</TT>. </div>
  +
  +
<div id="note5">5. L'exception est quand une exception est levée&nbsp;; dans ce cas, une partie de la pile est allouée sur le tas avant le <TT>longjmp</TT>. </div>

Revision as of 13:15, 27 November 2005

Template:TranslationWanted

*** You can also use the original article

People who want to interface with applications and/or libraries written in C should as far as possible limit themselves to the interfaces provided by Cecil and external. This page gives details about the generated C code, but these details ought not to be of any use to you. ;-)

The type identifiers

Description

SmartEiffel generates one unique identifier for each active type in the Eiffel code1. A lot of symbols in the generated C code depend on that identifier.

Don't depend on those identifiers! The mangling table is only valid for one specific compilation of one specific application with one specific compiler version, for one compiler compile_to_c in one specific version and libraries in one specific version... We do not guarantee the stabbility of these identifiers.

If 27 is an identifier, then:

  • The C type of an Eiffel object is T27;
  • The corresponding C structure is struct S27; in that structure, the names of the attributes are prefixed with an underscore (there may be some other fields used by SmartEiffel, in particular, the id field, which in this case has the value 27).
    Each reference type can be cast to T0 (although some reference types may not have the id field).
  • Each method is called r27methodname()
  • Each prefix method is called r27_px_methodname()
  • Each infix method is called r27_ix_nom_de_méthode()
  • Each late-binding method is called X27nom_de_méthode()
  • The object's creation method (when the garbage collector is used) is called new27()
  • The type model fis a variable called M27. Models are used for initialisation. For example:
T27 M27 = {27,NULL,NULL,0,0};
...
{T27*n=((T27*)se_malloc(sizeof(*n)));
 *n=M27;
 r27make(n);
...

Some characters in method names such as "<" or "+" will be replaced by their corresponding ASCII code in decimal.

An example

For example: STRING has the identifier 72. So:

  • The object type is T7.
typedef struct S7 T7;
  • The structure is defined in struct :S7.
struct S7{Tid id;T9 _storage;t2 _count;t2 _capacity;};
  • The append method becomes:
void r7append(se_dump_stack*caller,T7* C,T0* a1)

(See below for details on the dump stack execution stack.)

The .id file

When the application is compiled, the list of identifiers is stored in a file whose name is suffixed .id. The file is reread in incremental compilations (which allows some stability in the identifiers, so long as the whole project is not recompiled).

This file is structured in entries, each entry being separated from others by a hash character (#). This file looks like this:

4 "REAL"
#
12 "HELLO_WORLD"
class_path: "./hello_world.e"
class_name: HELLO_WORLD
parent_count: 0
c-type: T12 reference: yes
ref-status: live id-field: yes
run-time-set-count: 1
run-time-set:
        HELLO_WORLD (12)
#
9 "NATIVE_ARRAY[CHARACTER]"
class-path: "/lib/se/lib/storage/low_level/native_array.e"
class-name: NATIVE_ARRAY
parent-count: 1 parents: 26
c-type: T9 reference: no
#
21 "COMPARABLE"
class-path: "/lib/se/lib/abilities/comparable.e"
class-name: COMPARABLE
parent-count: 1 parents: 13
#
. . .

You should never depend on these identifiers. Indeed, when an identifier is computed, collisions may occur, and affect the process. Thus, the identifier and name of each type depends not only on the type name, but also on the order in which the types are compiled. That is, on the order of application and library types combined... They also depend on the compilation mode used (since that can change the list of active types), and the version of the compiler you're using. So what is T145 today may be T234 tomorrow3!

Consequently, do not ever rely on the generated identifiers, because they are not constant! Do not try to write in your own C code horrible things like new123 or T456, because the only thing we can guarantee is that this code will not work.

Naming convention

The preceding section has explained how methods are generated.

The function prototype r7append() from the example above is presented as

v7append(se_dump_stack*caller,T7* C,T0* a1);

This shows how Current and the arguments are passed. The rules are as follows:

  • Current is called C and is always strongly types (with its own exact type). This parameter may be omitted if Current is not used by the method. In some cases (when code is inlined), Current can be copied in local variables named C1, C2...
  • The arguments are called a1, a2... and are typed T0* for reference types or given their exact type for expanded types (e.g. T2 for an integer)
  • Inside functions, a local variable R is defined. This is Result.
  • Local variable keep their Eiffel name, prefixed by an underscore.

The mangling table

OK, now you understand why you cannot use type numbers, but you still want to know what those fields in the mangling table mean (in the .id file)>..

First, a big caveat. Although it may have been very stable for quite some time now, the mangling table coding may change! We currently have no plans to change it, and we prefer keeping it the way it is. But once again, we do not commit ourselves to the current representation.

Let's look again at the extract of a .id file. The part shown covers nearly all the possible cases:

4 "REAL"
#
12 "HELLO_WORLD"
class_path: "./hello_world.e"
class_name: HELLO_WORLD
parent_count: 0
c-type: T12 reference: yes
ref-status: live id-field: yes
run-time-set-count: 1
run-time-set:
        HELLO_WORLD (12)
#
9 "NATIVE_ARRAY[CHARACTER]"
class-path: "/lib/se/lib/storage/low_level/native_array.e"
class-name: NATIVE_ARRAY
parent-count: 1 parents: 26
c-type: T9 reference: no
#
21 "COMPARABLE"
class-path: "/lib/se/lib/abilities/comparable.e"
class-name: COMPARABLE
parent-count: 1 parents: 13
#
. . .

There is one entry per type (active or not); each entry spans many lines and is terminated by a hash symbol (#).

Each entry contains a lot of information. Not all of it is always present; missing entries take default values.

Only the first line is compulsory. It contains the type identifier, and its name (as would be returned by generating_type).

The following lines contain different fields, marked by a keyword, a colon and a value. There may be one or more fields on a single line. Those fields are:

class-path The path to the file containing the source code. May be omitted if the class has no associated file (uncommon).
class-name The name of the class, as returned by generator.
parent-count Le number of parents.
parents On the same line as parent-count if the latter is not null; it gives the list of parent class identifiers.
c-type The C type, usually in the form T27. If it is omitted, the class has no runnable type.

In that case, the following fields do not appear either.

reference On the same line as c-type, yes for a reference type or no for an expanded type.
ref-status Either live for an active type (i.e. instances of this type are created at run-time), or dead otherwise.
id-field On the same line as ref-status, yes if the id field has to be generated in the C structure (as its first element), no otherwise. This field is present if one of these confitions is true:
  • some late binding may occur on targets of that type,
  • or the structure may be accessed by an external or by cecil.

Note that a lot of calls are statically computed; the type inference algorithm used in SmartEiffel increases the number of such types that do not need the id field.

run-time-set-count The number of concrete, active descendants of the type (including itself). This is the number of items in run-time-set below.
run-time-set The concrete, active heirs of this type (including itself). One class per line, tab-indented.

The dump stack

When not in boost mode, a stack is managed by the runtime environment generated by SmartEiffel. This stack is displayed when an uncaught exception is raised. It is also used by the debugger sedb.

Technically, the SmartEiffel stack is built upon the native (C) stack. Each stack element is a se_dump_stack4 usually allocated on the stack. It is made up of several parts:

  • a frame descriptor, of type se_frame_descriptor44 which is a static description of the feature: run type, does it use Current, number and type of the locals (parameters and local variables), and an anti-recursion flag (when testing contracts);
  • une partie statique : le descripteur de cadre , de type se_frame_descriptor4 qui décrit statiquement la feature : type de Current, nombre et type des variables locales, type de Result, flag anti-récursion (pour les contrats)...
  • une partie dynamique :
    • un pointeur vers Current (i.e. soit un pointer sur l'objet expansé, ou bien un pointer sur la référence qui est elle-même un pointeur sur l'objet). C'est pourquoi le type du champ current est void**. Ce champ peut être NULL si Current n'est pas utilisé par la feature,
    • la position (utilisée principalement par sedb),
    • un pointeur vers l'appelant (i.e. le se_dump_stack de l'appelant),
    • un pointeur vers l'origine de l'exception : si non NULL, cela signifie que se_dump_stack n'est pas sur la pile, mais a été alloué dans le tas pour préserver la trace de l'exception,
    • un tableau de variables locales (avec la double indirection pour Current), d'où le type void***.

Des macros gèrent la liaison entre les les se_dump_stack.

  • Habituellement, le sommet de la pile est la variable globale se_dst définie dans SmartEiffel/sys/runtime/c/no_check.c. La macro set_dump_stack_top réalise l'affectation de son argument dans cette variable.
  • Dans le cas de SCOOP, chaque processeur a sa propre pile. La macro set_dump_stack_top a alors deux arguments : le processeur et le nouveau sommet.



1. Il y a une bijection entre l'identificateur et le nom du type et la valeur de ses paramètres génériques (pour les classes génériques).
2. Certains identifiants sont réservés pour les types "basiques". Ce sont:

Ainsi, il est probable que le type de l'objet racine ait l'identificateur 12. Mais ne comptez pas trop dessus (avant la révolution des entiers, c'était 11...)

3. Le compilateur ne s'amuse pas à changer les identificateurs sans raison. Le fichier .id est chargé au début, puis sauvegardé à la fin. Mais clean, par exemple, efface ce fichier.
4. Vous pouvez trouver une définition de ces structures dans le fichier SmartEiffel/sys/runtime/c/no_check.h.
5. L'exception est quand une exception est levée ; dans ce cas, une partie de la pile est allouée sur le tas avant le longjmp.