Expanded or reference

From Liberty Eiffel Wiki
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


*** Translation completed: Oliver Elphick 6/11/05 ***
*** A French speaker should check to be sure the intended ***
*** sense has been produced in English. ***

In our opinion, it is essential to have a good understanding of how objects are represented in memory while an application is running. The aim is not to make every one of us an expert in the subject, but more simply to have sufficient knowledge to be able to make design choices. Besides, it is often essential to be able to draw how objects appear in memory so as to be able to explain or think about the design of a new class. The section which explains the two categories of objects which exist during execution is essential. Next, a summary of the properties of expanded types goes into detail about certain points that are characteristic of these types.

You have to realise that all variables are automatically initialised in accordance with their declared type. This too needs to be understood.

The last section is only of interest to those few users who wish to define their own expanded class. This section can be skipped by most beginners.

The objects that are handled during the execution of an Eiffel program are handled either with or without an intermediate reference. Furthermore, when an object is handled through a reference, it can be handled only through a reference. In the same way, when an object is handled without a reference, it can only be handled in that way, that is to say without any intermediate reference.

Thus there are two sorts of objects. The most common classes correspond to objects that are handled through a reference. Less often, classes correspond to objects that are handled without an intermediate pointer. In this case, the class in question begins with the keyword expanded. In Eiffel jargon, when we speak of an expanded class, we are talking about a class whose objects are handled without an intermediate pointer, objects that are directly written, or expanded, onto the memory area that they use. In contrast, if we talk of a reference class or of a reference type, it is to emphasise that the objects in question are not expanded.

The chief point of this dichotomy is to make it possible to integrate the most basic entities into the object model. For example, a boolean value of the BOOLEAN class corresponds to an expanded object. Similarly, the classes INTEGER and REAL are also expanded. The following diagram shows what happens in memory; the classes POINT and TRIANGLE are ordinary classes, that is to say classes defined without the keyword expanded:

Expanded vs reference memory structure

In the figure above, the variable called a_boolean is declared as type BOOLEAN. Since the BOOLEAN class is expanded, the corresponding object is directly assigned to the memory area associated with the variable a_boolean. As the diagram shows, the a_boolean variable currently contains an object True which corresponds to the boolean value true. In the same way, the memory area associated with the variable a_integer of type INTEGER directly contains the object corresponding to the value 1 since the INTEGER class is also expanded. Finally, the variable a_real declared in the example as type REAL also directly contains the value 1.0.

In the diagram above, the variable point is declared as type POINT. Since we are not dealing with an expanded class, the memory area associated with the variable point does not directly contain the POINT object, but a pointer to that object. Thus the point variable does not directly contain an object of the POINT class; the point variable references a POINT object. Note that the diagram also shows us that each object of class POINT is itself made up of two attributes x and y whose type is REAL. As in the case of the point variable, the variable triangle is a reference to an object of the class TRIANGLE since the TRIANGLE class is an ordinary class, not expanded. Since the attributes p1, p2 and p3 are of the POINT type, in this case there are also pointers to the corresponding objects.

Whether or not an object is expanded, the syntax used to declare it is the same. The fact that a class is expanded or not depends only on the class definition itself. For example, the declaration of the variables a_boolean, a_integer, a_real, point and triangle corresponds to the following Eiffel code:

a_boolean: BOOLEAN; a_integer: INTEGER; a_real: REAL; point: POINT; triangle: TRIANGLE

Whether or not one is dealing with an expanded object, the notation is just the same. For example, the following instruction copies into the variable a_real the value of the attribute y of the POINT object referenced by the variable point:

a_real := point.y

The following instruction copies the pointer which points to the same POINT as the one referenced by the p1 attribute of the TRIANGLE which is itself referenced by the variable triangle:

point := triangle.p1

Without going into all the details of assignments, the effect of those two instructions in relation to the memory diagram above is to produce this memory configuration:

Expanded vs reference memory structure

Applying a method to an expanded object uses the same syntax as for an ordinary (not expanded) object. For example the following instruction applies the sqrt function to the INTEGER class object which is stored directly in the memory area associated with the variable a_integer. The result of calling the sqrt function is a REAL which overwrites the old object which was formerly stored directly in the variable a_real:

a_real := a_integer.sqrt

Having identical notation for handling ordinary (referenced) and expanded (not referenced) objects simplifies programming and provides consistency.

Being able to describe basic objects by a proper class is an important aid to consistency. A simple object, such as a 32 bit signed integer, is described by a proper class, the INTEGER class. As with any class, it is possible to change or adapt the methods in the INTEGER class. Of course, nearly all users only consult the list of available methods. Since the INTEGER class is used by almost every program, altering it is more and more an uncommon event, and must be done with care. Nevertheless, having a proper class lets users look up the list of available methods for INTEGER in the same way as they can for all other classes.

Among the predefined expanded classes corresponding to basic objects that must be familiar to all users are: BOOLEAN, INTEGER, REAL and CHARACTER. A well-informed user should also be familiar with the following expanded classes: INTEGER_8, INTEGER_16, INTEGER_32, INTEGER_64, REAL_32, REAL_64, REAL_80, REAL_128, REAL_EXTENDED and finally, for the most curious, there are the classes POINTER and NATIVE_ARRAY.

Principal characteristics of expanded type expressions

An expanded type expression has properties which one needs to know. As explained above, the object that corresponds to an expanded type expression cannot be referenced. No other location can directly designate the object in question; in particular no pointer can point to it. So if, for example, one is dealing with a variable of type INTEGER, the only way of operating on the corresponding object is to use the variable in question.

An expanded type expression always designates an object. In other words, an expanded expression is never Void. What is more, an expanded type expression always corresponds to one and only one category of object. In consequence, dynamic dispatch is never involved when an expanded object is the target of a call (that is, on the left of a dot). The invocation of a method with an expanded object as target corresponds to the most efficient possible direct call.

Without going into too much detail, since dynamic dispatch is not possible with an expanded object, it is useless to give such an object the information that lets you find its dynamic type. Thus, the memory space required for an expanded object is limited exactly to that needed to accommodate its different attributes.

Initialisation of variables and attributes

In Eiffel, all variables are always initialised automatically. This is the same for all kinds of variables: instance variables, local variables and the variable Result which is used to hold the result of a function. The way in which a variable is initialised depends entirely on its declaration type, which is either reference or expanded.

Initialisation of reference type variables

A variable whose type is a reference type is always automatically initialised with Void. For example, if you declare a variable of type POINT or type TRIANGLE, the non-expanded classes of the example above, the variable is automatically initialised with Void. For reference types, an object is never automatically created following the declaration of a variable.

Among the standard classes that correspond to reference types, let us consider the STRING class. This class is in no way a special case. A variable declared to be of type STRING is automatically initialised with Void. No object of the STRING class is created when a STRING type variable is declared. To take another common example, declaring a variable of type ARRAY[INTEGER] does not lead to the automatic creation of a table of INTEGERs, because the class ARRAY itself is still not an expanded class. So this simple initialisation rule applies to the majority of classes, which, it should be remembered, are ordinary, non-expanded classes.


Initialisation of expanded type variables

Variables whose type is expanded are also initialised automatically. For certain truly basic expanded types, the initialisation is done directly by the compiler. After presenting all these special cases, we will deal with the general case of an expanded class.

For the family of INTEGERs, that is to say all the following types, {INTEGER_8, INTEGER_16, INTEGER_32, INTEGER, INTEGER_64}, the object used to initialise them corresponds to the value 0.

For the group of types {REAL_32, REAL_64, REAL, REAL_80, REAL_128, REAL_EXTENDED}, the object used to initialise them corresponds to the value 0.0.

For the BOOLEAN type, the value False is used to initialise an object.

For the CHARACTER type, the character with the ASCII code 0 is used to initialise it. This character is denoted by '%U' in Eiffel.

For the POINTER type, variable initialisation is done with the machine value that represents the null pointer. Since this value is not in normal use, there is no Eiffel notation for it. It is necessary to use the is_null method of the POINTER class to test whether an expression of this type corresponds to the null value.

In the general case, that is to say for an expanded class that is not one of the special cases above, the initialisation is programmed by the class designer. In fact, an expanded class must have one and only one constructor with no argument. This creation procedure is automatically run to effect initialisation.

Some advice before you write an expanded class

In general, it is not useful to define new expanded classes and this applies to the great majority of applications. As we have explained above, the principal use of expanded classes is to make it possible to integrate the most basic objects into the object model. Having said that, it can perhaps be useful to resort to an expanded class, whether to make available a group of utility routines, or to economise on memory in the special case when a large number of small objects is used. Finally, we will present the traps to avoid when designing an expanded class.

Grouping a collection of routines

With an object-oriented language, it is always most convenient, if possible, to put the object-handling methods directly in the class of the objects to be handled. In certain very special cases, it is not desirable, nor even possible, to keep to this basic rule.

The COLLECTION_SORTER class is a good example of an expanded class intended for grouping some routines which cannot be placed directly in the classes that correspond to the idea of a COLLECTION.

Although all the routines of the COLLECTION_SORTER class are used for sorting objects from the COLLECTIONs family, it is not possible to put these methods directly into the COLLECTION class. This is not possible for the simple reason that not all COLLECTIONs can be sorted. Only COLLECTIONs whose elements are COMPARABLE can be sorted. So the COLLECTION_SORTER class lets us add this extra generic constraint. The COLLECTION_SORTER class does not have any attribute; it is just a holder for methods. Furthermore, this class is expanded. So, to sort a whole table, for example, you can proceed in the following manner:

local
   sorter: COLLECTION_SORTER[INTEGER]
   array: ARRAY[INTEGER]
do
   array := <<1,3,2>>
   sorter.sort(array)

The point of using an expanded class in this case is that it is unnecessary to allocate space for an object of the COLLECTION_SORTER[INTEGER] class in the heap. No doubt you have noticed in reading the above code that there is no creation instruction for the object associated with the variable sorter. Furthermore, since the objects of this class have no attribute, the object corresponding to sorter is not even represented on the stack!
Here, using an expanded object lets us get better performance.

Economising on memory for many small objects

Another use for expanded classes is to let you economise on memory for small objects. By small object we mean an object whose memory requirement is comparable to or slightly higher than the machine's pointer size. If for argument's sake we assume that an object of class FOO is exactly the same size as a machine pointer and that the application uses n objects of class FOO, we then need at least 2 * n memory locations of the size of a machine pointer. If we change the definition of the FOO class to make it expanded, we then save n memory locations of the size of a machine pointer.

Be careful, because to take advantage of this gain you must also be in the very special case where dynamic dispatch is not useful with a type FOO variable. Indeed, as indicated above, as soon as an expression is expanded you can no longer take advantage of dynamic dispatch.

As in the foregoing example, using an expanded class instead of an ordinary class allows memory savings. We should also note that the objects in question can no longer be identified by their memory address. Indeed, when the FOO type is expanded, the comparison of two variables of this type with the = operator no longer compares two addresses but actually compares two objects. You must also be aware of this final point before opting to define an expanded class.

Traps to avoid with expanded classes

The first trap to avoid is to do with classes with many attributes. When an expanded object is very big, passing this object as a parameter to a routine can turn out to be much more expensive than with an ordinary object. Indeed, for an ordinary object, only its memory address is copied onto the stack. For an expanded object, it is the object itself, that is to say all its attributes, which is copied during parameter passing. Note that the same can happen during the assignment of an expanded type variable. For very big expanded objects, assignments can turn out to lower performance.

The second trap to avoid is much more dangerous, because it does not only risk slowing down the application's execution. This trap is to do with the mistaken interpretation that one can give to a modifying procedure call applied to an expanded object obtained by means of an indirect reading of an attribute:

bar.foo.set_attribute(zoo)

is in fact equivakent to the following sequence of instructions:

temporary_foo := bar.foo             -- (1) copy expanded object foo
temporary_foo.set_attribute(zoo)     -- (2) modify the copy of foo

In effect, even though foo is an attribute, since foo's class is expanded, the corresponding object is copied. So the object that you thought you were modifying with the procedure set_attribute is not the one associated with the attribute foo, but a temporary copy in memory! Note that when foo is a function call, the previously described transformation which introduces the variable temporary_foo does not change anything. It is just the same as if foo were a reference type.

Unfortunately, the actual compiler does not yet warn of this pitfall. We plan to change the compiler to make it able to flag up this trap by a warning message, but this is not yet implemented. At the moment (August 2005) no implementation has yet been decided, but it is likely that, in the future, the compiler will ask you to add the temporary variable explicitly so that you can be aware of the real problem. Since several solutions can be envisaged, including adding new restrictions on the definition of expanded classes, it is best for the moment to be cautious in defining new expanded classes.

To avoid falling into this subtle trap, it is preferable whenever possible to avoid exporting the modification methods of expanded classes. Best of all is not to have any modification methods at all. Note that all the following expanded basic classes respect this rule. There are no modification procedures in the classes: BOOLEAN, CHARACTER, INTEGER_8, INTEGER_16, INTEGER_32, INTEGER, INTEGER_64, REAL_32, REAL_64, REAL, REAL_80, REAL_128, REAL_EXTENDED and POINTER. So long as the compiler is unable to warn you of this possible problem, be careful when you use expanded classes.