X-Git-Url: https://git.distorted.org.uk/~mdw/sod/blobdiff_plain/756e9293611d2b1dc34fba6cca89fd70098f5546..5cf27cf1ad4bd072c995c4fd8f353a6351c94e1f:/doc/concepts.tex diff --git a/doc/concepts.tex b/doc/concepts.tex index e301ae7..6b21109 100644 --- a/doc/concepts.tex +++ b/doc/concepts.tex @@ -63,22 +63,33 @@ having to implement additional syntax. For the most part, Sod takes a fairly traditional view of what it means to be an object system. -An \emph{object} maintains \emph{state} and exhibits \emph{behaviour}. An -object's state is maintained in named \emph{slots}, each of which can store a -C value of an appropriate (scalar or aggregate) type. An object's behaviour -is stimulated by sending it \emph{messages}. A message has a name, and may -carry a number of arguments, which are C values; sending a message may result -in the state of receiving object (or other objects) being changed, and a C -value being returned to the sender. - -Every object is a (direct) instance of some \emph{class}. The class -determines which slots its instances have, which messages its instances can -be sent, and which methods are invoked when those messages are received. The -Sod translator's main job is to read class definitions and convert them into -appropriate C declarations, tables, and functions. An object cannot +An \emph{object} maintains \emph{state} and exhibits \emph{behaviour}. +(Here, we're using the term `object' in the usual sense of `object-oriented +programming', rather than that of the ISO~C standard. Once we have defined +an `instance' below, we shall generally prefer that term, so as to prevent +further confusion between these two uses of the word.) + +An object's state is maintained in named \emph{slots}, each of which can +store a C value of an appropriate (scalar or aggregate) type. An object's +behaviour is stimulated by sending it \emph{messages}. A message has a name, +and may carry a number of arguments, which are C values; sending a message +may result in the state of receiving object (or other objects) being changed, +and a C value being returned to the sender. + +Every object is a \emph{direct instance} of exactly one \emph{class}. The +class determines which slots its instances have, which messages its instances +can be sent, and which methods are invoked when those messages are received. +The Sod translator's main job is to read class definitions and convert them +into appropriate C declarations, tables, and functions. An object cannot (usually) change its direct class, and the direct class of an object is not affected by, for example, the static type of a pointer to it. +If an object~$x$ is a direct instance of some class~$C$, then we say that $C$ +is \emph{the class of}~$x$. Note that the class of an object is a property +of the object's value at runtime, and not of C's compile-time type system. +We shall be careful in distinguishing C's compile-time notion of \emph{type} +from Sod's run-time notion of \emph{class}. + \subsection{Superclasses and inheritance} \label{sec:concepts.classes.inherit} @@ -120,11 +131,11 @@ class then the only superclass of $C$ is $C$ itself, and $C$ has no proper superclasses. If an object is a direct instance of class~$C$ then the object is also an -(indirect) instance of every superclass of $C$. +(indirect) \emph{instance} of every superclass of $C$. -If $C$ has a proper superclass $B$, then $B$ is not allowed to have $C$ has a -direct superclass. In different terms, if we construct a graph, whose -vertices are classes, and draw an edge from each class to each of its direct +If $C$ has a proper superclass $B$, then $B$ must not have $C$ as a direct +superclass. In different terms, if we construct a directed graph, whose +nodes are classes, and draw an arc from each class to each of its direct superclasses, then this graph must be acyclic. In yet other terms, the `is a superclass of' relation is a partial order on classes. @@ -148,7 +159,7 @@ following properties are expected to hold. list, i.e., $B$ is a more specific superclass of $C$ than $A$ is. \end{itemize} The default linearization algorithm used in Sod is the \emph{C3} algorithm, -which has a number of good properties described in~\cite{FIXME:C3}. +which has a number of good properties described in~\cite{Barrett:1996:MSL}. It works as follows. \begin{itemize} \item A \emph{merge} of some number of input lists is a single list @@ -170,6 +181,92 @@ It works as follows. class whose subclass appears earliest in $C$'s local precedence order. \end{itemize} +\begin{figure} + \centering + \begin{tikzpicture}[x=7.5mm, y=-14mm, baseline=(current bounding box.east)] + \node[lit] at ( 0, 0) (R) {SodObject}; + \node[lit] at (-3, +1) (A) {A}; \draw[->] (A) -- (R); + \node[lit] at (-1, +1) (B) {B}; \draw[->] (B) -- (R); + \node[lit] at (+1, +1) (C) {C}; \draw[->] (C) -- (R); + \node[lit] at (+3, +1) (D) {D}; \draw[->] (D) -- (R); + \node[lit] at (-2, +2) (E) {E}; \draw[->] (E) -- (A); + \draw[->] (E) -- (B); + \node[lit] at (+2, +2) (F) {F}; \draw[->] (F) -- (A); + \draw[->] (F) -- (D); + \node[lit] at (-1, +3) (G) {G}; \draw[->] (G) -- (E); + \draw[->] (G) -- (C); + \node[lit] at (+1, +3) (H) {H}; \draw[->] (H) -- (F); + \node[lit] at ( 0, +4) (I) {I}; \draw[->] (I) -- (G); + \draw[->] (I) -- (H); + \end{tikzpicture} + \quad + \vrule + \quad + \begin{minipage}[c]{0.45\hsize} + \begin{nprog} + class A: SodObject \{ \}\quad\=@/* @|A|, @|SodObject| */ \\ + class B: SodObject \{ \}\>@/* @|B|, @|SodObject| */ \\ + class C: SodObject \{ \}\>@/* @|B|, @|SodObject| */ \\ + class D: SodObject \{ \}\>@/* @|B|, @|SodObject| */ \\+ + class E: A, B \{ \}\quad\=@/* @|E|, @|A|, @|B|, \dots */ \\ + class F: A, D \{ \}\>@/* @|F|, @|A|, @|D|, \dots */ \\+ + class G: E, C \{ \}\>@/* @|G|, @|E|, @|A|, + @|B|, @|C|, \dots */ \\ + class H: F \{ \}\>@/* @|H|, @|F|, @|A|, @|D|, \dots */ \\+ + class I: G, H \{ \}\>@/* @|I|, @|G|, @|E|, @|H|, @|F|, + @|A|, @|B|, @|C|, @|D|, \dots */ + \end{nprog} + \end{minipage} + + \caption{An example class graph and class precedence lists} + \label{fig:concepts.classes.cpl-example} +\end{figure} + +\begin{example} + Consider the class relationships shown in + \xref{fig:concepts.classes.cpl-example}. + + \begin{itemize} + + \item @|SodObject| has no proper superclasses. Its class precedence list + is therefore simply $\langle @|SodObject| \rangle$. + + \item In general, if $X$ is a direct subclass only of $Y$, and $Y$'s class + precedence list is $\langle Y, \ldots \rangle$, then $X$'s class + precedence list is $\langle X, Y, \ldots \rangle$. This explains $A$, + $B$, $C$, $D$, and $H$. + + \item $E$'s list is found by merging its local precedence list $\langle E, + A, B \rangle$ with the class precedence lists of its direct superclasses, + which are $\langle A, @|SodObject| \rangle$ and $\langle B, @|SodObject| + \rangle$. Clearly, @|SodObject| must be last, and $E$'s local precedence + list orders the rest, giving $\langle E, A, B, @|SodObject|, \rangle$. + $F$ is similar. + + \item We determine $G$'s class precedence list by merging the three lists + $\langle G, E, C \rangle$, $\langle E, A, B, @|SodObject| \rangle$, and + $\langle C, @|SodObject| \rangle$. The class precedence list begins + $\langle G, E, \ldots \rangle$, but the individual lists don't order $A$ + and $C$. Comparing these to $G$'s direct superclasses, we see that $A$ + is a superclass of $E$, while $C$ is a superclass of -- indeed equal to + -- $C$; so $A$ must precede $C$, as must $B$, and the final list is + $\langle G, E, A, B, C, @|SodObject| \rangle$. + + \item Finally, we determine $I$'s class precedence list by merging $\langle + I, G, H \rangle$, $\langle G, E, A, B, C, @|SodObject| \rangle$, and + $\langle H, F, A, D, @|SodObject| \rangle$. The list begins $\langle I, + G, \ldots \rangle$, and then we must break a tie between $E$ and $H$; but + $E$ is a superclass of $G$, so $E$ wins. Next, $H$ and $F$ must precede + $A$, since these are ordered by $H$'s class precedence list. Then $B$ + and $C$ precede $D$, since the former are superclasses of $G$, and the + final list is $\langle I, G, E, H, F, A, B, C, D, @|SodObject| \rangle$. + + \end{itemize} + + (This example combines elements from \cite{Barrett:1996:MSL} and + \cite{Ducournau:1994:PMM}.) +\end{example} + \subsubsection{Class links and chains} The definition for a class $C$ may distinguish one of its proper superclasses as being the \emph{link superclass} for class $C$. Not every class need have @@ -190,6 +287,7 @@ class in a chain is called the \emph{chain head}; the most specific class is the \emph{chain tail}. Chains are often named after their chain head classes. + \subsection{Names} \label{sec:concepts.classes.names} @@ -240,9 +338,16 @@ the @|me| pointer: in an initializer for a slot defined by a class $C$, @|me| has type `pointer to $C$'. (Note that the type of @|me| depends only on the class which defined the slot, not the class which defined the initializer.) +A class can also define \emph{class slot initializers}, which provide values +for a slot defined by its metaclass; see \xref{sec:concepts.metaclasses} for +details. + \subsection{C language integration} \label{sec:concepts.classes.c} +It is very important to distinguish compile-time C \emph{types} from Sod's +run-time \emph{classes}: see \xref{sec:concepts.classes}. + For each class~$C$, the Sod translator defines a C type, the \emph{class type}, with the same name. This is the usual type used when considering an object as an instance of class~$C$. No entire object will normally have a @@ -252,6 +357,10 @@ class type,\footnote{% chains. See \xref{sec:structures.layout} for the full details.} % so access to instances is almost always via pointers. +Usually, a value of type pointer-to-class-type of class~$C$ will point into +an instance of class $C$. However, clever (or foolish) use of pointer +conversions can invalidate this relationship. + \subsubsection{Access to slots} The class type for a class~$C$ is actually a structure. It contains one member for each class in $C$'s superclass chain, named with that class's @@ -259,8 +368,14 @@ nickname. Each of these members is also a structure, containing the corresponding class's slots, one member per slot. There's nothing special about these slot members: C code can access them in the usual way. -For example, if @|MyClass| has the nickname @|mine|, and defines a slot @|x| -of type @|int|, then the simple function +For example, given the definition +\begin{prog} + [nick = mine] \\ + class MyClass: SodObject \{ \\ \ind + int x; \-\\ + \} +\end{prog} +the simple function \begin{prog} int get_x(MyClass *m) \{ return (m@->mine.x); \} \end{prog} @@ -271,9 +386,56 @@ slots. If you want to hide implementation details, the best approach is to stash them in a dynamically allocated private structure, and leave a pointer to it in a slot. (This will also help preserve binary compatibility, because the private structure can grow more members as needed. See -\xref{sec:fixme.compatibility} for more details.) +\xref{sec:concepts.compatibility} for more details.) + +Slots defined by $C$'s link superclass, or any other superclass in the same +chain, can be accessed in the same way. Slots defined by other superclasses +can't be accessed directly: the instance pointer must be \emph{converted} to +point to a different chain. See the subsection `Conversions' below. + + +\subsubsection{Sending messages} +Sod defines a macro for each message. If a class $C$ defines a message $m$, +then the macro is called @|$C$_$m$|. The macro takes a pointer to the +receiving object as its first argument, followed by the message arguments, if +any, and returns the value returned by the object's effective method for the +message (if any). If you have a pointer to an instance of any of $C$'s +subclasses, then you can send it the message; it doesn't matter whether the +subclass is on the same chain. Note that the receiver argument is evaluated +twice, so it's not safe to write a receiver expression which has +side-effects. + +For example, suppose we defined +\begin{prog} + [nick = soupy] \\ + class Super: SodObject \{ \\ \ind + void msg(const char *m); \-\\ + \} \\+ + class Sub: Super \{ \\ \ind + void soupy.msg(const char *m) + \{ printf("sub sent `\%s'@\\n", m); \} \-\\ + \} +\end{prog} +then we can send the message like this: +\begin{prog} + Sub *sub = /* \dots\ */; \\ + Super_msg(sub, "hello"); +\end{prog} -\subsubsection{Vtables} +What happens under the covers is as follows. The structure pointed to by the +instance pointer has a member named @|_vt|, which points to a structure +called a `virtual table', or \emph{vtable}, which contains various pieces of +information about the object's direct class and layout, and holds pointers to +method entries for the messages which the object can receive. The +message-sending macro in the example above expands to something similar to +\begin{prog} + sub@->_vt.sub.msg(sub, "Hello"); +\end{prog} + +The vtable contains other useful information, such as a pointer to the +instance's direct class's \emph{class object} (described below). The full +details of the contents and layout of vtables are given in +\xref{sec:structures.layout.vtable}. \subsubsection{Class objects} @@ -288,8 +450,10 @@ and its type is usually @|SodClass|; @|SodClass|'s nickname is @|cls|. A class object's slots contain or point to useful information, tables and functions for working with that class's instances. (The @|SodClass| class -doesn't define any messages, so it doesn't have any methods. In Sod, a class -slot containing a function pointer is not at all the same thing as a method.) +doesn't define any messages, so it doesn't have any methods other than for +the @|SodObject| lifecycle messages @|init| and @|teardown|; see +\xref{sec:concepts.lifecycle}. In Sod, a class slot containing a function +pointer is not at all the same thing as a method.) \subsubsection{Conversions} Suppose one has a value of type pointer-to-class-type for some class~$C$, and @@ -314,7 +478,7 @@ There are three main cases to distinguish. conversion can fail: the object in question might not be an instance of~$B$ after all. The macro \descref{SOD_CONVERT}{mac} and the function \descref{sod_convert}{fun} perform general conversions. They return a null - pointer if the conversion fails. (There are therefore your analogue to the + pointer if the conversion fails. (These are therefore your analogue to the \Cplusplus\ @|dynamic_cast<>| operator.) \end{itemize} The Sod translator generates macros for performing both in-chain and @@ -370,6 +534,13 @@ Keyword arguments can be provided in three ways. call, which is useful when writing wrapper functions. \end{enumerate} +Perhaps surprisingly, keyword arguments have a relatively small performance +impact. On the author's aging laptop, a call to a simple function, passing +two out of three keyword arguments, takes about 30 cycles longer than calling +a standard function which just takes integer arguments. On the other hand, +quite a lot of code is involved in decoding keyword arguments, so code size +will naturally suffer. + Keyword arguments are provided as a general feature for C functions. However, Sod has special support for messages which accept keyword arguments (\xref{sec:concepts.methods.keywords}); and they play an essential rôle in @@ -463,9 +634,8 @@ constructed: the vtables contain null pointers in place of pointers to method entry functions. \begin{figure} - \begin{tikzpicture} - [>=stealth, thick, - order/.append style={color=green!70!black}, + \hbox to\hsize{\hss\hbox{\begin{tikzpicture} + [order/.append style={color=green!70!black}, code/.append style={font=\sffamily}, action/.append style={font=\itshape}, method/.append style={rectangle, draw=black, thin, fill=blue!30, @@ -544,7 +714,7 @@ entry functions. {Least to \\ most \\ specific}; \draw [<-] ($(fn.north west) + (6mm, 1mm)$) -- ++(-8mm, 8mm); - \end{tikzpicture} + \end{tikzpicture}}\hss} \caption{The standard method combination} \label{fig:concepts.methods.stdmeth} @@ -677,9 +847,7 @@ There is also a @|custom| aggregating method combination, which is described in \xref{sec:fixme.custom-aggregating-method-combination}. -\subsection{Sending messages in C} \label{sec:concepts.methods.c} - -Each instance is associated with its direct class [FIXME] +\subsection{Method entries} \label{sec:concepts.methods.entry} The effective methods for each class are determined at translation time, by the Sod translator. For each effective method, one or more \emph{method @@ -719,11 +887,10 @@ While method combinations may set their own rules, usually keyword methods can only be defined on keyword messages, and all methods defined on a keyword message must be keyword methods. The direct methods defined on a keyword message may differ in the keywords they accept, both from each other, and -from the message. If two superclasses of some common class both define -keyword methods on the same message, and the methods both accept a keyword -argument with the same name, then these two keyword arguments must also have -the same type. Different applicable methods may declare keyword arguments -with the same name but different defaults; see below. +from the message. If two applicable methods on the same message both accept +a keyword argument with the same name, then these two keyword arguments must +also have the same type. Different applicable methods may declare keyword +arguments with the same name but different defaults; see below. The keyword arguments acceptable in a message sent to an object are the keywords listed in the message definition, together with all of the keywords @@ -847,8 +1014,8 @@ other method takes steps to prevent this. Slots are initialized in a well-defined order. \begin{itemize} -\item Slots defined by a more specific superclasses are initialized after - slots defined by a less specific superclass. +\item Slots defined by a more specific superclass are initialized after slots + defined by a less specific superclass. \item Slots defined by the same class are initialized in the order in which their definitions appear. \end{itemize} @@ -993,6 +1160,111 @@ determined using the \descref{SOD_INSTBASE}[macro]{mac}. %%%-------------------------------------------------------------------------- \section{Metaclasses} \label{sec:concepts.metaclasses} +In Sod, every object is an instance of some class, and -- unlike, say, +\Cplusplus\ -- classes are proper objects. It follows that, in Sod, every +class~$C$ is itself an instance of some class~$M$, which is called $C$'s +\emph{metaclass}. Metaclass instances are usually constructed statically, at +compile time, and marked read-only. + +As an added complication, Sod classes, and other metaobjects such as +messages, methods, slots and so on, also have classes \emph{at translation +time}. These translation-time metaclasses are not Sod classes; they are CLOS +classes, implemented in Common Lisp. + + +\subsection{Runtime metaclasses} +\label{sec:concepts.metaclasses.runtime} + +Like other classes, metaclasses can declare messages, and define slots and +methods. Slots defined by the metaclass are called \emph{class slots}, as +opposed to \emph{instance slots}. Similarly, messages and methods defined by +the metaclass are termed \emph{class messages} and \emph{class methods} +respectively, though these are used much less frequently. + +\subsubsection{The braid} +Every object is an instance of some class. There are only finitely many +classes. + +\begin{figure} + \centering + \begin{tikzpicture} + \node[lit] (obj) {SodObject}; + \node[lit] (cls) [right=10mm of obj] {SodClass}; + \draw [->, dashed] (obj) to[bend right] (cls); + \draw [->] (cls) to[bend right] (obj); + \draw [->, dashed] (cls) to[loop right] (cls); + \end{tikzpicture} + \qquad + \fbox{\ \begin{tikzpicture} + \node (subclass) {subclass of}; + \node (instance) [below=\jot of subclass] {instance of}; + \draw [->] ($(subclass.west) - (10mm, 0)$) -- ++(8mm, 0); + \draw [->, dashed] ($(instance.west) - (10mm, 0)$) -- ++(8mm, 0); + \end{tikzpicture}} + \caption{The Sod braid} \label{fig:concepts.metaclasses.braid} +\end{figure} + +Consider the directed graph whose nodes are classes, and where there is an +arc from $C$ to $D$ if and only if $C$ is an instance of $D$. There are only +finitely many nodes. Every node has an arc leaving it, because every object +-- and hence every class -- is an instance of some class. Therefore this +graph must contain at least one cycle. + +In Sod, this situation is resolved in the simplest manner possible: +@|SodClass| is the only predefined metaclass, and it is an instance of +itself. The only other predefined class is @|SodObject|, which is also an +instance of @|SodClass|. There is exactly one root class, namely +@|SodObject|; consequently, @|SodClass| is a direct subclass of @|SodObject|. + +\Xref{fig:concepts.metaclasses.braid} shows a diagram of this situation. + +\subsubsection{Class slots and initializers} +Instance initializers were described in \xref{sec:concepts.classes.slots}. A +class can also define \emph{class initializers}, which provide values for +slots defined by its metaclass. The initial value for a class slot is +determined as follows. +\begin{itemize} +\item Nonstandard slot classes may be initialized by custom Lisp code. For + example, all of the slots defined by @|SodClass| are of this kind. User + initializers are not permitted for such slots. +\item If the class or any of its superclasses defines a class initializer for + the slot, then the class initializer defined by the most specific such + superclass is used. +\item Otherwise, if the metaclass or one of its superclasses defines an + instance initializer, then the instance initializer defined by he most + specific such class is used. +\item Otherwise there is no initializer, and an error will be reported. +\end{itemize} +Initializers for class slots must be constant expressions (for scalar slots) +or aggregate initializers containing constant expressions. + +\subsubsection{Metaclass selection and consistency} +Sod enforces a \emph{metaclass consistency rule}: if $C$ has metaclass $M$, +then any subclass $C$ must have a metaclass which is a subclass of $M$. + +The definition of a new class can name the new class's metaclass explicitly, +by defining a @|metaclass| property; the Sod translator will verify that the +choice of metaclass is acceptable. + +If no @|metaclass| property is given, then the translator will select a +default metaclass as follows. Let $C_1$, $C_2$, \dots, $C_n$ be the direct +superclasses of the new class, and let $M_1$, $M_2$, \dots, $M_n$ be their +respective metaclasses (not necessarily distinct). If there exists exactly +one minimal metaclass $M_i$, i.e., there exists an $i$, with $1 \le i \le n$, +such that $M_i$ is a subclass of every $M_j$, for $1 \le j \le n$, then $M_i$ +is selected as the new class's metaclass. Otherwise the situation is +ambiguous and an error will be reported. Usually, the ambiguity can be +resolved satisfactorily by defining a new class $M^*$ as a direct subclass of +the minimal $M_j$. + + +\subsection{Translation-time metaobjects} +\label{sec:concepts.metaclasses.compile-time} + + + +\fixme{unwritten} + %%%-------------------------------------------------------------------------- \section{Compatibility considerations} \label{sec:concepts.compatibility} @@ -1007,7 +1279,7 @@ rather fragile binary interfaces.\footnote{% compactness, and efficiency of slot access and message sending. There may be interesting trade-offs to be made.} % -If instances are allocated [FIXME] +If instances are allocated \fixme{incomplete} %%%----- That's all, folks --------------------------------------------------