| 1 | %%% -*-latex-*- |
| 2 | %%% |
| 3 | %%% Conceptual background |
| 4 | %%% |
| 5 | %%% (c) 2015 Straylight/Edgeware |
| 6 | %%% |
| 7 | |
| 8 | %%%----- Licensing notice --------------------------------------------------- |
| 9 | %%% |
| 10 | %%% This file is part of the Sensible Object Design, an object system for C. |
| 11 | %%% |
| 12 | %%% SOD is free software; you can redistribute it and/or modify |
| 13 | %%% it under the terms of the GNU General Public License as published by |
| 14 | %%% the Free Software Foundation; either version 2 of the License, or |
| 15 | %%% (at your option) any later version. |
| 16 | %%% |
| 17 | %%% SOD is distributed in the hope that it will be useful, |
| 18 | %%% but WITHOUT ANY WARRANTY; without even the implied warranty of |
| 19 | %%% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
| 20 | %%% GNU General Public License for more details. |
| 21 | %%% |
| 22 | %%% You should have received a copy of the GNU General Public License |
| 23 | %%% along with SOD; if not, write to the Free Software Foundation, |
| 24 | %%% Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. |
| 25 | |
| 26 | \chapter{Concepts} \label{ch:concepts} |
| 27 | |
| 28 | %%%-------------------------------------------------------------------------- |
| 29 | \section{Modules} \label{sec:concepts.modules} |
| 30 | |
| 31 | A \emph{module} is the top-level syntactic unit of input to the Sod |
| 32 | translator. As described above, given an input module, the translator |
| 33 | generates C source and header files. |
| 34 | |
| 35 | A module can \emph{import} other modules. This makes the type names and |
| 36 | classes defined in those other modules available to class definitions in the |
| 37 | importing module. Sod's module system is intentionally very simple. There |
| 38 | are no private declarations or attempts to hide things. |
| 39 | |
| 40 | As well as importing existing modules, a module can include a number of |
| 41 | different kinds of \emph{items}: |
| 42 | \begin{itemize} |
| 43 | \item \emph{class definitions} describe new classes, possibly in terms of |
| 44 | existing classes; |
| 45 | \item \emph{type name declarations} introduce new type names to Sod's |
| 46 | parser;\footnote{% |
| 47 | This is unfortunately necessary because C syntax, upon which Sod's input |
| 48 | language is based for obvious reasons, needs to treat type names |
| 49 | differently from other kinds of identifiers.} % |
| 50 | and |
| 51 | \item \emph{code fragments} contain literal C code to be dropped into an |
| 52 | appropriate place in an output file. |
| 53 | \end{itemize} |
| 54 | Each kind of item, and, indeed, a module as a whole, can have a collection of |
| 55 | \emph{properties} associated with it. A property has a \emph{name} and a |
| 56 | \emph{value}. Properties are an open-ended way of attaching additional |
| 57 | information to module items, so extensions can make use of them without |
| 58 | having to implement additional syntax. |
| 59 | |
| 60 | %%%-------------------------------------------------------------------------- |
| 61 | \section{Classes, instances, and slots} \label{sec:concepts.classes} |
| 62 | |
| 63 | For the most part, Sod takes a fairly traditional view of what it means to be |
| 64 | an object system. |
| 65 | |
| 66 | An \emph{object} maintains \emph{state} and exhibits \emph{behaviour}. An |
| 67 | object's state is maintained in named \emph{slots}, each of which can store a |
| 68 | C value of an appropriate (scalar or aggregate) type. An object's behaviour |
| 69 | is stimulated by sending it \emph{messages}. A message has a name, and may |
| 70 | carry a number of arguments, which are C values; sending a message may result |
| 71 | in the state of receiving object (or other objects) being changed, and a C |
| 72 | value being returned to the sender. |
| 73 | |
| 74 | Every object is a (direct) instance of some \emph{class}. The class |
| 75 | determines which slots its instances have, which messages its instances can |
| 76 | be sent, and which methods are invoked when those messages are received. The |
| 77 | Sod translator's main job is to read class definitions and convert them into |
| 78 | appropriate C declarations, tables, and functions. An object cannot |
| 79 | (usually) change its direct class, and the direct class of an object is not |
| 80 | affected by, for example, the static type of a pointer to it. |
| 81 | |
| 82 | |
| 83 | \subsection{Superclasses and inheritance} |
| 84 | \label{sec:concepts.classes.inherit} |
| 85 | |
| 86 | \subsubsection{Class relationships} |
| 87 | Each class has zero or more \emph{direct superclasses}. |
| 88 | |
| 89 | A class with no direct superclasses is called a \emph{root class}. The Sod |
| 90 | runtime library includes a root class named @|SodObject|; making new root |
| 91 | classes is somewhat tricky, and won't be discussed further here. |
| 92 | |
| 93 | Classes can have more than one direct superclass, i.e., Sod supports |
| 94 | \emph{multiple inheritance}. A Sod class definition for a class~$C$ lists |
| 95 | the direct superclasses of $C$ in a particular order. This order is called |
| 96 | the \emph{local precedence order} of $C$, and the list which consists of $C$ |
| 97 | follows by $C$'s direct superclasses in local precedence order is called the |
| 98 | $C$'s \emph{local precedence list}. |
| 99 | |
| 100 | The multiple inheritance in Sod works similarly to multiple inheritance in |
| 101 | Lisp-like languages, such as Common Lisp, EuLisp, Dylan, and Python, which is |
| 102 | very different from how multiple inheritance works in \Cplusplus.\footnote{% |
| 103 | The latter can be summarized as `badly'. By default in \Cplusplus, an |
| 104 | instance receives an additional copy of superclass's state for each path |
| 105 | through the class graph from the instance's direct class to that |
| 106 | superclass, though this behaviour can be overridden by declaring |
| 107 | superclasses to be @|virtual|. Also, \Cplusplus\ offers only trivial |
| 108 | method combination (\xref{sec:concepts.methods}), leaving programmers to |
| 109 | deal with delegation manually and (usually) statically.} % |
| 110 | |
| 111 | If $C$ is a class, then the \emph{superclasses} of $C$ are |
| 112 | \begin{itemize} |
| 113 | \item $C$ itself, and |
| 114 | \item the superclasses of each of $C$'s direct superclasses. |
| 115 | \end{itemize} |
| 116 | The \emph{proper superclasses} of a class $C$ are the superclasses of $C$ |
| 117 | except for $C$ itself. If a class $B$ is a (direct, proper) superclass of |
| 118 | $C$, then $C$ is a \emph{(direct, proper) subclass} of $B$. If $C$ is a root |
| 119 | class then the only superclass of $C$ is $C$ itself, and $C$ has no proper |
| 120 | superclasses. |
| 121 | |
| 122 | If an object is a direct instance of class~$C$ then the object is also an |
| 123 | (indirect) instance of every superclass of $C$. |
| 124 | |
| 125 | If $C$ has a proper superclass $B$, then $B$ must not have $C$ as a direct |
| 126 | superclass. In different terms, if we construct a graph, whose vertices are |
| 127 | classes, and draw an edge from each class to each of its direct superclasses, |
| 128 | then this graph must be acyclic. In yet other terms, the `is a superclass |
| 129 | of' relation is a partial order on classes. |
| 130 | |
| 131 | \subsubsection{The class precedence list} |
| 132 | This partial order is not quite sufficient for our purposes. For each class |
| 133 | $C$, we shall need to extend it into a total order on $C$'s superclasses. |
| 134 | This calculation is called \emph{superclass linearization}, and the result is |
| 135 | a \emph{class precedence list}, which lists each of $C$'s superclasses |
| 136 | exactly once. If a superclass $B$ precedes (resp.\ follows) some other |
| 137 | superclass $A$ in $C$'s class precedence list, then we say that $B$ is a more |
| 138 | (resp.\ less) \emph{specific} superclass of $C$ than $A$ is. |
| 139 | |
| 140 | The superclass linearization algorithm isn't fixed, and extensions to the |
| 141 | translator can introduce new linearizations for special effects, but the |
| 142 | following properties are expected to hold. |
| 143 | \begin{itemize} |
| 144 | \item The first class in $C$'s class precedence list is $C$ itself; i.e., |
| 145 | $C$ is always its own most specific superclass. |
| 146 | \item If $A$ and $B$ are both superclasses of $C$, and $A$ is a proper |
| 147 | superclass of $B$ then $A$ appears after $B$ in $C$'s class precedence |
| 148 | list, i.e., $B$ is a more specific superclass of $C$ than $A$ is. |
| 149 | \end{itemize} |
| 150 | The default linearization algorithm used in Sod is the \emph{C3} algorithm, |
| 151 | which has a number of good properties described in~\cite{Barrett:1996:MSL}. |
| 152 | It works as follows. |
| 153 | \begin{itemize} |
| 154 | \item A \emph{merge} of some number of input lists is a single list |
| 155 | containing each item that is in any of the input lists exactly once, and no |
| 156 | other items; if an item $x$ appears before an item $y$ in any input list, |
| 157 | then $x$ also appears before $y$ in the merge. If a collection of lists |
| 158 | have no merge then they are said to be \emph{inconsistent}. |
| 159 | \item The class precedence list of a class $C$ is a merge of the local |
| 160 | precedence list of $C$ together with the class precedence lists of each of |
| 161 | $C$'s direct superclasses. |
| 162 | \item If there are no such merges, then the definition of $C$ is invalid. |
| 163 | \item Suppose that there are multiple candidate merges. Consider the |
| 164 | earliest position in these candidate merges at which they disagree. The |
| 165 | \emph{candidate classes} at this position are the classes appearing at this |
| 166 | position in the candidate merges. Each candidate class must be a |
| 167 | superclass of distinct direct superclasses of $C$, since otherwise the |
| 168 | candidates would be ordered by their common subclass's class precedence |
| 169 | list. The class precedence list contains, at this position, that candidate |
| 170 | class whose subclass appears earliest in $C$'s local precedence order. |
| 171 | \end{itemize} |
| 172 | |
| 173 | \begin{figure} |
| 174 | \centering |
| 175 | \begin{tikzpicture}[x=7.5mm, y=-14mm, baseline=(current bounding box.east)] |
| 176 | \node[lit] at ( 0, 0) (R) {SodObject}; |
| 177 | \node[lit] at (-3, +1) (A) {A}; \draw[->] (A) -- (R); |
| 178 | \node[lit] at (-1, +1) (B) {B}; \draw[->] (B) -- (R); |
| 179 | \node[lit] at (+1, +1) (C) {C}; \draw[->] (C) -- (R); |
| 180 | \node[lit] at (+3, +1) (D) {D}; \draw[->] (D) -- (R); |
| 181 | \node[lit] at (-2, +2) (E) {E}; \draw[->] (E) -- (A); |
| 182 | \draw[->] (E) -- (B); |
| 183 | \node[lit] at (+2, +2) (F) {F}; \draw[->] (F) -- (A); |
| 184 | \draw[->] (F) -- (D); |
| 185 | \node[lit] at (-1, +3) (G) {G}; \draw[->] (G) -- (E); |
| 186 | \draw[->] (G) -- (C); |
| 187 | \node[lit] at (+1, +3) (H) {H}; \draw[->] (H) -- (F); |
| 188 | \node[lit] at ( 0, +4) (I) {I}; \draw[->] (I) -- (G); |
| 189 | \draw[->] (I) -- (H); |
| 190 | \end{tikzpicture} |
| 191 | \quad |
| 192 | \vrule |
| 193 | \quad |
| 194 | \begin{minipage}[c]{0.45\hsize} |
| 195 | \begin{nprog} |
| 196 | class A: SodObject \{ \}\quad\=@/* @|A|, @|SodObject| */ \\ |
| 197 | class B: SodObject \{ \}\>@/* @|B|, @|SodObject| */ \\ |
| 198 | class C: SodObject \{ \}\>@/* @|B|, @|SodObject| */ \\ |
| 199 | class D: SodObject \{ \}\>@/* @|B|, @|SodObject| */ \\+ |
| 200 | class E: A, B \{ \}\quad\=@/* @|E|, @|A|, @|B|, \dots */ \\ |
| 201 | class F: A, D \{ \}\>@/* @|F|, @|A|, @|D|, \dots */ \\+ |
| 202 | class G: E, C \{ \}\>@/* @|G|, @|E|, @|A|, |
| 203 | @|B|, @|C|, \dots */ \\ |
| 204 | class H: F \{ \}\>@/* @|H|, @|F|, @|A|, @|D|, \dots */ \\+ |
| 205 | class I: G, H \{ \}\>@/* @|I|, @|G|, @|E|, @|H|, @|F|, |
| 206 | @|A|, @|B|, @|C|, @|D|, \dots */ |
| 207 | \end{nprog} |
| 208 | \end{minipage} |
| 209 | |
| 210 | \caption{An example class graph and class precedence lists} |
| 211 | \label{fig:concepts.classes.cpl-example} |
| 212 | \end{figure} |
| 213 | |
| 214 | \begin{example} |
| 215 | Consider the class relationships shown in |
| 216 | \xref{fig:concepts.classes.cpl-example}. |
| 217 | |
| 218 | \begin{itemize} |
| 219 | |
| 220 | \item @|SodObject| has no proper superclasses. Its class precedence list |
| 221 | is therefore simply $\langle @|SodObject| \rangle$. |
| 222 | |
| 223 | \item In general, if $X$ is a direct subclass only of $Y$, and $Y$'s class |
| 224 | precedence list is $\langle Y, \ldots \rangle$, then $X$'s class |
| 225 | precedence list is $\langle X, Y, \ldots \rangle$. This explains $A$, |
| 226 | $B$, $C$, $D$, and $H$. |
| 227 | |
| 228 | \item $E$'s list is found by merging its local precedence list $\langle E, |
| 229 | A, B \rangle$ with the class precedence lists of its direct superclasses, |
| 230 | which are $\langle A, @|SodObject| \rangle$ and $\langle B, @|SodObject| |
| 231 | \rangle$. Clearly, @|SodObject| must be last, and $E$'s local precedence |
| 232 | list orders the rest, giving $\langle E, A, B, @|SodObject|, \rangle$. |
| 233 | $F$ is similar. |
| 234 | |
| 235 | \item We determine $G$'s class precedence list by merging the three lists |
| 236 | $\langle G, E, C \rangle$, $\langle E, A, B, @|SodObject| \rangle$, and |
| 237 | $\langle C, @|SodObject| \rangle$. The class precedence list begins |
| 238 | $\langle G, E, \ldots \rangle$, but the individual lists don't order $A$ |
| 239 | and $C$. Comparing these to $G$'s direct superclasses, we see that $A$ |
| 240 | is a subclass of $E$, while $C$ is a subclass of -- indeed equal to -- |
| 241 | $C$; so $A$ must precede $C$, as must $B$, and the final list is $\langle |
| 242 | G, E, A, B, C, @|SodObject| \rangle$. |
| 243 | |
| 244 | \item Finally, we determine $I$'s class precedence list by merging $\langle |
| 245 | I, G, H \rangle$, $\langle G, E, A, B, C, @|SodObject| \rangle$, and |
| 246 | $\langle H, F, A, D, @|SodObject| \rangle$. The list begins $\langle I, |
| 247 | G, \ldots \rangle$, and then we must break a tie between $E$ and $H$; but |
| 248 | $E$ is a subclass of $G$, so $E$ wins. Next, $H$ and $F$ must precede |
| 249 | $A$, since these are ordered by $H$'s class precedence list. Then $B$ |
| 250 | and $C$ precede $D$, since the former are superclasses of $G$, and the |
| 251 | final list is $\langle I, G, E, H, F, A, B, C, D, @|SodObject| \rangle$. |
| 252 | |
| 253 | \end{itemize} |
| 254 | |
| 255 | (This example combines elements from \cite{Barrett:1996:MSL} and |
| 256 | \cite{Ducournau:1994:PMM}.) |
| 257 | \end{example} |
| 258 | |
| 259 | \subsubsection{Class links and chains} |
| 260 | The definition for a class $C$ may distinguish one of its proper superclasses |
| 261 | as being the \emph{link superclass} for class $C$. Not every class need have |
| 262 | a link superclass, and the link superclass of a class $C$, if it exists, need |
| 263 | not be a direct superclass of $C$. |
| 264 | |
| 265 | Superclass links must obey the following rule: if $C$ is a class, then there |
| 266 | must be no three distinct superclasses $X$, $Y$ and~$Z$ of $C$ such that $Z$ |
| 267 | is the link superclass of both $X$ and $Y$. As a consequence of this rule, |
| 268 | the superclasses of $C$ can be partitioned into linear \emph{chains}, such |
| 269 | that superclasses $A$ and $B$ are in the same chain if and only if one can |
| 270 | trace a path from $A$ to $B$ by following superclass links, or \emph{vice |
| 271 | versa}. |
| 272 | |
| 273 | Since a class links only to one of its proper superclasses, the classes in a |
| 274 | chain are naturally ordered from most- to least-specific. The least specific |
| 275 | class in a chain is called the \emph{chain head}; the most specific class is |
| 276 | the \emph{chain tail}. Chains are often named after their chain head |
| 277 | classes. |
| 278 | |
| 279 | \subsection{Names} |
| 280 | \label{sec:concepts.classes.names} |
| 281 | |
| 282 | Classes have a number of other attributes: |
| 283 | \begin{itemize} |
| 284 | \item A \emph{name}, which is a C identifier. Class names must be globally |
| 285 | unique. The class name is used in the names of a number of associated |
| 286 | definitions, to be described later. |
| 287 | \item A \emph{nickname}, which is also a C identifier. Unlike names, |
| 288 | nicknames are not required to be globally unique. If $C$ is any class, |
| 289 | then all the superclasses of $C$ must have distinct nicknames. |
| 290 | \end{itemize} |
| 291 | |
| 292 | |
| 293 | \subsection{Slots} \label{sec:concepts.classes.slots} |
| 294 | |
| 295 | Each class defines a number of \emph{slots}. Much like a structure member, a |
| 296 | slot has a \emph{name}, which is a C identifier, and a \emph{type}. Unlike |
| 297 | many other object systems, different superclasses of a class $C$ can define |
| 298 | slots with the same name without ambiguity, since slot references are always |
| 299 | qualified by the defining class's nickname. |
| 300 | |
| 301 | \subsubsection{Slot initializers} |
| 302 | As well as defining slot names and types, a class can also associate an |
| 303 | \emph{initial value} with each slot defined by itself or one of its |
| 304 | subclasses. A class $C$ provides an \emph{initialization message} (see |
| 305 | \xref{sec:concepts.lifecycle.birth}, and \xref{sec:structures.root.sodclass}) |
| 306 | whose methods set the slots of a \emph{direct} instance of the class to the |
| 307 | correct initial values. If several of $C$'s superclasses define initializers |
| 308 | for the same slot then the initializer from the most specific such class is |
| 309 | used. If none of $C$'s superclasses define an initializer for some slot then |
| 310 | that slot will be left uninitialized. |
| 311 | |
| 312 | The initializer for a slot with scalar type may be any C expression. The |
| 313 | initializer for a slot with aggregate type must contain only constant |
| 314 | expressions if the generated code is expected to be processed by a |
| 315 | implementation of C89. Initializers will be evaluated once each time an |
| 316 | instance is initialized. |
| 317 | |
| 318 | Slots are initialized in reverse-precedence order of their defining classes; |
| 319 | i.e., slots defined by a less specific superclass are initialized earlier |
| 320 | than slots defined by a more specific superclass. Slots defined by the same |
| 321 | class are initialized in the order in which they appear in the class |
| 322 | definition. |
| 323 | |
| 324 | The initializer for a slot may refer to other slots in the same object, via |
| 325 | the @|me| pointer: in an initializer for a slot defined by a class $C$, @|me| |
| 326 | has type `pointer to $C$'. (Note that the type of @|me| depends only on the |
| 327 | class which defined the slot, not the class which defined the initializer.) |
| 328 | |
| 329 | A class can also define \emph{class slot initializers}, which provide values |
| 330 | for a slot defined by its metaclass; see \xref{sec:concepts.metaclasses} for |
| 331 | details. |
| 332 | |
| 333 | |
| 334 | \subsection{C language integration} \label{sec:concepts.classes.c} |
| 335 | |
| 336 | For each class~$C$, the Sod translator defines a C type, the \emph{class |
| 337 | type}, with the same name. This is the usual type used when considering an |
| 338 | object as an instance of class~$C$. No entire object will normally have a |
| 339 | class type,\footnote{% |
| 340 | In general, a class type only captures the structure of one of the |
| 341 | superclass chains of an instance. A full instance layout contains multiple |
| 342 | chains. See \xref{sec:structures.layout} for the full details.} % |
| 343 | so access to instances is almost always via pointers. |
| 344 | |
| 345 | \subsubsection{Access to slots} |
| 346 | The class type for a class~$C$ is actually a structure. It contains one |
| 347 | member for each class in $C$'s superclass chain, named with that class's |
| 348 | nickname. Each of these members is also a structure, containing the |
| 349 | corresponding class's slots, one member per slot. There's nothing special |
| 350 | about these slot members: C code can access them in the usual way. |
| 351 | |
| 352 | For example, if @|MyClass| has the nickname @|mine|, and defines a slot @|x| |
| 353 | of type @|int|, then the simple function |
| 354 | \begin{prog} |
| 355 | int get_x(MyClass *m) \{ return (m@->mine.x); \} |
| 356 | \end{prog} |
| 357 | will extract the value of @|x| from an instance of @|MyClass|. |
| 358 | |
| 359 | All of this means that there's no such thing as `private' or `protected' |
| 360 | slots. If you want to hide implementation details, the best approach is to |
| 361 | stash them in a dynamically allocated private structure, and leave a pointer |
| 362 | to it in a slot. (This will also help preserve binary compatibility, because |
| 363 | the private structure can grow more members as needed. See |
| 364 | \xref{sec:concepts.compatibility} for more details.) |
| 365 | |
| 366 | |
| 367 | \subsubsection{Sending messages} |
| 368 | Sod defines a macro for each message. If a class $C$ defines a message $m$, |
| 369 | then the macro is called @|$C$_$m$|. The macro takes a pointer to the |
| 370 | receiving object as its first argument, followed by the message arguments, if |
| 371 | any, and returns the value returned by the object's effective method for the |
| 372 | message (if any). If you have a pointer to an instance of any of $C$'s |
| 373 | subclasses, then you can send it the message; it doesn't matter whether the |
| 374 | subclass is on the same chain. Note that the receiver argument is evaluated |
| 375 | twice, so it's not safe to write a receiver expression which has |
| 376 | side-effects. |
| 377 | |
| 378 | For example, suppose we defined |
| 379 | \begin{prog} |
| 380 | [nick = soupy] \\ |
| 381 | class Super: SodObject \{ \\ \ind |
| 382 | void msg(const char *m); \-\\ |
| 383 | \} \\+ |
| 384 | class Sub: Super \{ \\ \ind |
| 385 | void soupy.msg(const char *m) |
| 386 | \{ printf("sub sent `\%s'@\\n", m); \} \-\\ |
| 387 | \} |
| 388 | \end{prog} |
| 389 | then we can send the message like this: |
| 390 | \begin{prog} |
| 391 | Sub *sub = /* \dots\ */; \\ |
| 392 | Super_msg(sub, "hello"); |
| 393 | \end{prog} |
| 394 | |
| 395 | What happens under the covers is as follows. The structure pointed to by the |
| 396 | instance pointer has a member named @|_vt|, which points to a structure |
| 397 | called a `virtual table', or \emph{vtable}, which contains various pieces of |
| 398 | information about the object's direct class and layout, and holds pointers to |
| 399 | method entries for the messages which the object can receive. The |
| 400 | message-sending macro in the example above expands to something similar to |
| 401 | \begin{prog} |
| 402 | sub@->_vt.sub.msg(sub, "Hello"); |
| 403 | \end{prog} |
| 404 | |
| 405 | The vtable contains other useful information, such as a pointer to the |
| 406 | instance's direct class's \emph{class object} (described below). The full |
| 407 | details of the contents and layout of vtables are given in |
| 408 | \xref{sec:structures.layout.vtable}. |
| 409 | |
| 410 | |
| 411 | \subsubsection{Class objects} |
| 412 | In Sod's object system, classes are objects too. Therefore classes are |
| 413 | themselves instances; the class of a class is called a \emph{metaclass}. The |
| 414 | consequences of this are explored in \xref{sec:concepts.metaclasses}. The |
| 415 | \emph{class object} has the same name as the class, suffixed with |
| 416 | `@|__class|'\footnote{% |
| 417 | This is not quite true. @|$C$__class| is actually a macro. See |
| 418 | \xref{sec:structures.layout.additional} for the gory details.} % |
| 419 | and its type is usually @|SodClass|; @|SodClass|'s nickname is @|cls|. |
| 420 | |
| 421 | A class object's slots contain or point to useful information, tables and |
| 422 | functions for working with that class's instances. (The @|SodClass| class |
| 423 | doesn't define any messages, so it doesn't have any methods other than for |
| 424 | the @|SodObject| lifecycle messages @|init| and @|teardown|; see |
| 425 | \xref{sec:concepts.lifecycle}. In Sod, a class slot containing a function |
| 426 | pointer is not at all the same thing as a method.) |
| 427 | |
| 428 | \subsubsection{Conversions} |
| 429 | Suppose one has a value of type pointer-to-class-type for some class~$C$, and |
| 430 | wants to convert it to a pointer-to-class-type for some other class~$B$. |
| 431 | There are three main cases to distinguish. |
| 432 | \begin{itemize} |
| 433 | \item If $B$ is a superclass of~$C$, in the same chain, then the conversion |
| 434 | is an \emph{in-chain upcast}. The conversion can be performed using the |
| 435 | appropriate generated upcast macro (see below), or by simply casting the |
| 436 | pointer, using C's usual cast operator (or the \Cplusplus\ @|static_cast<>| |
| 437 | operator). |
| 438 | \item If $B$ is a superclass of~$C$, in a different chain, then the |
| 439 | conversion is a \emph{cross-chain upcast}. The conversion is more than a |
| 440 | simple type change: the pointer value must be adjusted. If the direct |
| 441 | class of the instance in question is not known, the conversion will require |
| 442 | a lookup at runtime to find the appropriate offset by which to adjust the |
| 443 | pointer. The conversion can be performed using the appropriate generated |
| 444 | upcast macro (see below); the general case is handled by the macro |
| 445 | \descref{SOD_XCHAIN}{mac}. |
| 446 | \item If $B$ is a subclass of~$C$ then the conversion is a \emph{downcast}; |
| 447 | otherwise the conversion is a~\emph{cross-cast}. In either case, the |
| 448 | conversion can fail: the object in question might not be an instance of~$B$ |
| 449 | after all. The macro \descref{SOD_CONVERT}{mac} and the function |
| 450 | \descref{sod_convert}{fun} perform general conversions. They return a null |
| 451 | pointer if the conversion fails. (These are therefore your analogue to the |
| 452 | \Cplusplus\ @|dynamic_cast<>| operator.) |
| 453 | \end{itemize} |
| 454 | The Sod translator generates macros for performing both in-chain and |
| 455 | cross-chain upcasts. For each class~$C$, and each proper superclass~$B$ |
| 456 | of~$C$, a macro is defined: given an argument of type pointer to class type |
| 457 | of~$C$, it returns a pointer to the same instance, only with type pointer to |
| 458 | class type of~$B$, adjusted as necessary in the case of a cross-chain |
| 459 | conversion. The macro is named by concatenating |
| 460 | \begin{itemize} |
| 461 | \item the name of class~$C$, in upper case, |
| 462 | \item the characters `@|__CONV_|', and |
| 463 | \item the nickname of class~$B$, in upper case; |
| 464 | \end{itemize} |
| 465 | e.g., if $C$ is named @|MyClass|, and $B$'s name is @|SuperClass| with |
| 466 | nickname @|super|, then the macro @|MYCLASS__CONV_SUPER| converts a |
| 467 | @|MyClass~*| to a @|SuperClass~*|. See |
| 468 | \xref{sec:structures.layout.additional} for the formal description. |
| 469 | |
| 470 | %%%-------------------------------------------------------------------------- |
| 471 | \section{Keyword arguments} \label{sec:concepts.keywords} |
| 472 | |
| 473 | In standard C, the actual arguments provided to a function are matched up |
| 474 | with the formal arguments given in the function definition according to their |
| 475 | ordering in a list. Unless the (rather cumbersome) machinery for dealing |
| 476 | with variable-length argument tails (@|<stdarg.h>|) is used, exactly the |
| 477 | correct number of arguments must be supplied, and in the correct order. |
| 478 | |
| 479 | A \emph{keyword argument} is matched by its distinctive \emph{name}, rather |
| 480 | than by its position in a list. Keyword arguments may be \emph{omitted}, |
| 481 | causing some default behaviour by the function. A function can detect |
| 482 | whether a particular keyword argument was supplied: so the default behaviour |
| 483 | need not be the same as that caused by any specific value of the argument. |
| 484 | |
| 485 | Keyword arguments can be provided in three ways. |
| 486 | \begin{enumerate} |
| 487 | \item Directly, as a variable-length argument tail, consisting (for the most |
| 488 | part) of alternating keyword names, as pointers to null-terminated strings, |
| 489 | and argument values, and terminated by a null pointer. This is somewhat |
| 490 | error-prone, and the support library defines some macros which help ensure |
| 491 | that keyword argument lists are well formed. |
| 492 | \item Indirectly, through a @|va_list| object capturing a variable-length |
| 493 | argument tail passed to some other function. Such indirect argument tails |
| 494 | have the same structure as the direct argument tails described above. |
| 495 | Because @|va_list| objects are hard to copy, the keyword-argument support |
| 496 | library consistently passes @|va_list| objects \emph{by reference} |
| 497 | throughout its programming interface. |
| 498 | \item Indirectly, through a vector of @|struct kwval| objects, each of which |
| 499 | contains a keyword name, as a pointer to a null-terminated string, and the |
| 500 | \emph{address} of a corresponding argument value. (This indirection is |
| 501 | necessary so that the items in the vector can be of uniform size.) |
| 502 | Argument vectors are rather inconvenient to use, but are the only practical |
| 503 | way in which a caller can decide at runtime which arguments to include in a |
| 504 | call, which is useful when writing wrapper functions. |
| 505 | \end{enumerate} |
| 506 | |
| 507 | Keyword arguments are provided as a general feature for C functions. |
| 508 | However, Sod has special support for messages which accept keyword arguments |
| 509 | (\xref{sec:concepts.methods.keywords}); and they play an essential rôle in |
| 510 | the instance construction protocol (\xref{sec:concepts.lifecycle.birth}). |
| 511 | |
| 512 | %%%-------------------------------------------------------------------------- |
| 513 | \section{Messages and methods} \label{sec:concepts.methods} |
| 514 | |
| 515 | Objects can be sent \emph{messages}. A message has a \emph{name}, and |
| 516 | carries a number of \emph{arguments}. When an object is sent a message, a |
| 517 | function, determined by the receiving object's class, is invoked, passing it |
| 518 | the receiver and the message arguments. This function is called the |
| 519 | class's \emph{effective method} for the message. The effective method can do |
| 520 | anything a C function can do, including reading or updating program state or |
| 521 | object slots, sending more messages, calling other functions, issuing system |
| 522 | calls, or performing I/O; if it finishes, it may return a value, which is |
| 523 | returned in turn to the message sender. |
| 524 | |
| 525 | The set of messages an object can receive, characterized by their names, |
| 526 | argument types, and return type, is determined by the object's class. Each |
| 527 | class can define new messages, which can be received by any instance of that |
| 528 | class. The messages defined by a single class must have distinct names: |
| 529 | there is no `function overloading'. As with slots |
| 530 | (\xref{sec:concepts.classes.slots}), messages defined by distinct classes are |
| 531 | always distinct, even if they have the same names: references to messages are |
| 532 | always qualified by the defining class's name or nickname. |
| 533 | |
| 534 | Messages may take any number of arguments, of any non-array value type. |
| 535 | Since message sends are effectively function calls, arguments of array type |
| 536 | are implicitly converted to values of the corresponding pointer type. While |
| 537 | message definitions may ascribe an array type to an argument, the formal |
| 538 | argument will have pointer type, as is usual for C functions. A message may |
| 539 | accept a variable-length argument suffix, denoted @|\dots|. |
| 540 | |
| 541 | A class definition may include \emph{direct methods} for messages defined by |
| 542 | it or any of its superclasses. |
| 543 | |
| 544 | Like messages, direct methods define argument lists and return types, but |
| 545 | they may also have a \emph{body}, and a \emph{rôle}. |
| 546 | |
| 547 | A direct method need not have the same argument list or return type as its |
| 548 | message. The acceptable argument lists and return types for a method depend |
| 549 | on the message, in particular its method combination |
| 550 | (\xref{sec:concepts.methods.combination}), and the method's rôle. |
| 551 | |
| 552 | A direct method body is a block of C code, and the Sod translator usually |
| 553 | defines, for each direct method, a function with external linkage, whose body |
| 554 | contains a copy of the direct method body. Within the body of a direct |
| 555 | method defined for a class $C$, the variable @|me|, of type pointer to class |
| 556 | type of $C$, refers to the receiving object. |
| 557 | |
| 558 | |
| 559 | \subsection{Effective methods and method combinations} |
| 560 | \label{sec:concepts.methods.combination} |
| 561 | |
| 562 | For each message a direct instance of a class might receive, there is a set |
| 563 | of \emph{applicable methods}, which are exactly the direct methods defined on |
| 564 | the object's class and its superclasses. These direct methods are combined |
| 565 | together to form the \emph{effective method} for that particular class and |
| 566 | message. Direct methods can be combined into an effective method in |
| 567 | different ways, according to the \emph{method combination} specified by the |
| 568 | message. The method combination determines which direct method rôles are |
| 569 | acceptable, and, for each rôle, the appropriate argument lists and return |
| 570 | types. |
| 571 | |
| 572 | One direct method, $M$, is said to be more (resp.\ less) \emph{specific} than |
| 573 | another, $N$, with respect to a receiving class~$C$, if the class defining |
| 574 | $M$ is a more (resp.\ less) specific superclass of~$C$ than the class |
| 575 | defining $N$. |
| 576 | |
| 577 | \subsubsection{The standard method combination} |
| 578 | The default method combination is called the \emph{standard method |
| 579 | combination}; other method combinations are useful occasionally for special |
| 580 | effects. The standard method combination accepts four direct method rôles, |
| 581 | called `primary' (the default), @|before|, @|after|, and @|around|. |
| 582 | |
| 583 | All direct methods subject to the standard method combination must have |
| 584 | argument lists which \emph{match} the message's argument list: |
| 585 | \begin{itemize} |
| 586 | \item the method's arguments must have the same types as the message, though |
| 587 | the arguments may have different names; and |
| 588 | \item if the message accepts a variable-length argument suffix then the |
| 589 | direct method must instead have a final argument of type @|va_list|. |
| 590 | \end{itemize} |
| 591 | Primary and @|around| methods must have the same return type as the message; |
| 592 | @|before| and @|after| methods must return @|void| regardless of the |
| 593 | message's return type. |
| 594 | |
| 595 | If there are no applicable primary methods then no effective method is |
| 596 | constructed: the vtables contain null pointers in place of pointers to method |
| 597 | entry functions. |
| 598 | |
| 599 | \begin{figure} |
| 600 | \hbox to\hsize{\hss\hbox{\begin{tikzpicture} |
| 601 | [order/.append style={color=green!70!black}, |
| 602 | code/.append style={font=\sffamily}, |
| 603 | action/.append style={font=\itshape}, |
| 604 | method/.append style={rectangle, draw=black, thin, fill=blue!30, |
| 605 | text height=\ht\strutbox, text depth=\dp\strutbox, |
| 606 | minimum width=40mm}] |
| 607 | |
| 608 | \def\delgstack#1#2#3{ |
| 609 | \node (#10) [method, #2] {#3}; |
| 610 | \node (#11) [method, above=6mm of #10] {#3}; |
| 611 | \draw [->] ($(#10.north)!.5!(#10.north west) + (0mm, 1mm)$) -- |
| 612 | ++(0mm, 4mm) |
| 613 | node [code, left=4pt, midway] {next_method}; |
| 614 | \draw [<-] ($(#10.north)!.5!(#10.north east) + (0mm, 1mm)$) -- |
| 615 | ++(0mm, 4mm) |
| 616 | node [action, right=4pt, midway] {return}; |
| 617 | \draw [->] ($(#11.north)!.5!(#11.north west) + (0mm, 1mm)$) -- |
| 618 | ++(0mm, 4mm) |
| 619 | node [code, left=4pt, midway] {next_method} |
| 620 | node (ld) [above] {$\smash\vdots\mathstrut$}; |
| 621 | \draw [<-] ($(#11.north)!.5!(#11.north east) + (0mm, 1mm)$) -- |
| 622 | ++(0mm, 4mm) |
| 623 | node [action, right=4pt, midway] {return} |
| 624 | node (rd) [above] {$\smash\vdots\mathstrut$}; |
| 625 | \draw [->] ($(ld.north) + (0mm, 1mm)$) -- ++(0mm, 4mm) |
| 626 | node [code, left=4pt, midway] {next_method}; |
| 627 | \draw [<-] ($(rd.north) + (0mm, 1mm)$) -- ++(0mm, 4mm) |
| 628 | node [action, right=4pt, midway] {return}; |
| 629 | \node (p) at ($(ld.north)!.5!(rd.north)$) {}; |
| 630 | \node (#1n) [method, above=5mm of p] {#3}; |
| 631 | \draw [->, order] ($(#10.south east) + (4mm, 1mm)$) -- |
| 632 | ($(#1n.north east) + (4mm, -1mm)$) |
| 633 | node [midway, right, align=left] |
| 634 | {Most to \\ least \\ specific};} |
| 635 | |
| 636 | \delgstack{a}{}{@|around| method} |
| 637 | \draw [<-] ($(a0.south)!.5!(a0.south west) - (0mm, 1mm)$) -- |
| 638 | ++(0mm, -4mm); |
| 639 | \draw [->] ($(a0.south)!.5!(a0.south east) - (0mm, 1mm)$) -- |
| 640 | ++(0mm, -4mm) |
| 641 | node [action, right=4pt, midway] {return}; |
| 642 | |
| 643 | \draw [->] ($(an.north)!.6!(an.north west) + (0mm, 1mm)$) -- |
| 644 | ++(-8mm, 8mm) |
| 645 | node [code, midway, left=3mm] {next_method} |
| 646 | node (b0) [method, above left = 1mm + 4mm and -6mm - 4mm] {}; |
| 647 | \node (b1) [method] at ($(b0) - (2mm, 2mm)$) {}; |
| 648 | \node (bn) [method] at ($(b1) - (2mm, 2mm)$) {@|before| method}; |
| 649 | \draw [->, order] ($(bn.west) - (6mm, 0mm)$) -- ++(12mm, 12mm) |
| 650 | node [midway, above left, align=center] {Most to \\ least \\ specific}; |
| 651 | \draw [->] ($(b0.north east) + (-10mm, 1mm)$) -- ++(8mm, 8mm) |
| 652 | node (p) {}; |
| 653 | |
| 654 | \delgstack{m}{above right=1mm and 0mm of an.west |- p}{Primary method} |
| 655 | \draw [->] ($(mn.north)!.5!(mn.north west) + (0mm, 1mm)$) -- ++(0mm, 4mm) |
| 656 | node [code, left=4pt, midway] {next_method} |
| 657 | node [above right = 0mm and -8mm] |
| 658 | {$\vcenter{\hbox{\Huge\textcolor{red}{!}}} |
| 659 | \vcenter{\hbox{\begin{tabular}[c]{l} |
| 660 | \textsf{next_method} \\ |
| 661 | pointer is null |
| 662 | \end{tabular}}}$}; |
| 663 | |
| 664 | \draw [->, color=blue, dotted] |
| 665 | ($(m0.south)!.2!(m0.south east) - (0mm, 1mm)$) -- |
| 666 | ($(an.north)!.2!(an.north east) + (0mm, 1mm)$) |
| 667 | node [midway, sloped, below] {Return value}; |
| 668 | |
| 669 | \draw [<-] ($(an.north)!.6!(an.north east) + (0mm, 1mm)$) -- |
| 670 | ++(8mm, 8mm) |
| 671 | node [action, midway, right=3mm] {return} |
| 672 | node (f0) [method, above right = 1mm and -6mm] {}; |
| 673 | \node (f1) [method] at ($(f0) + (-2mm, 2mm)$) {}; |
| 674 | \node (fn) [method] at ($(f1) + (-2mm, 2mm)$) {@|after| method}; |
| 675 | \draw [<-, order] ($(f0.east) + (6mm, 0mm)$) -- ++(-12mm, 12mm) |
| 676 | node [midway, above right, align=center] |
| 677 | {Least to \\ most \\ specific}; |
| 678 | \draw [<-] ($(fn.north west) + (6mm, 1mm)$) -- ++(-8mm, 8mm); |
| 679 | |
| 680 | \end{tikzpicture}}\hss} |
| 681 | |
| 682 | \caption{The standard method combination} |
| 683 | \label{fig:concepts.methods.stdmeth} |
| 684 | \end{figure} |
| 685 | |
| 686 | The effective method for a message with standard method combination works as |
| 687 | follows (see also~\xref{fig:concepts.methods.stdmeth}). |
| 688 | \begin{enumerate} |
| 689 | |
| 690 | \item If any applicable methods have the @|around| rôle, then the most |
| 691 | specific such method, with respect to the class of the receiving object, is |
| 692 | invoked. |
| 693 | |
| 694 | Within the body of an @|around| method, the variable @|next_method| is |
| 695 | defined, having pointer-to-function type. The method may call this |
| 696 | function, as described below, any number of times. |
| 697 | |
| 698 | If there any remaining @|around| methods, then @|next_method| invokes the |
| 699 | next most specific such method, returning whichever value that method |
| 700 | returns; otherwise the behaviour of @|next_method| is to invoke the |
| 701 | @|before| methods (if any), followed by the most specific primary method, |
| 702 | followed by the @|after| methods (if any), and to return whichever value |
| 703 | was returned by the most specific primary method, as described in the |
| 704 | following items. That is, the behaviour of the least specific @|around| |
| 705 | method's @|next_method| function is exactly the behaviour that the |
| 706 | effective method would have if there were no @|around| methods. Note that |
| 707 | if the least-specific @|around| method calls its @|next_method| more than |
| 708 | once then the whole sequence of @|before|, primary, and @|after| methods |
| 709 | occurs multiple times. |
| 710 | |
| 711 | The value returned by the most specific @|around| method is the value |
| 712 | returned by the effective method. |
| 713 | |
| 714 | \item If any applicable methods have the @|before| rôle, then they are all |
| 715 | invoked, starting with the most specific. |
| 716 | |
| 717 | \item The most specific applicable primary method is invoked. |
| 718 | |
| 719 | Within the body of a primary method, the variable @|next_method| is |
| 720 | defined, having pointer-to-function type. If there are no remaining less |
| 721 | specific primary methods, then @|next_method| is a null pointer. |
| 722 | Otherwise, the method may call the @|next_method| function any number of |
| 723 | times. |
| 724 | |
| 725 | The behaviour of the @|next_method| function, if it is not null, is to |
| 726 | invoke the next most specific applicable primary method, and to return |
| 727 | whichever value that method returns. |
| 728 | |
| 729 | If there are no applicable @|around| methods, then the value returned by |
| 730 | the most specific primary method is the value returned by the effective |
| 731 | method; otherwise the value returned by the most specific primary method is |
| 732 | returned to the least specific @|around| method, which called it via its |
| 733 | own @|next_method| function. |
| 734 | |
| 735 | \item If any applicable methods have the @|after| rôle, then they are all |
| 736 | invoked, starting with the \emph{least} specific. (Hence, the most |
| 737 | specific @|after| method is invoked with the most `afterness'.) |
| 738 | |
| 739 | \end{enumerate} |
| 740 | |
| 741 | A typical use for @|around| methods is to allow a base class to set up the |
| 742 | dynamic environment appropriately for the primary methods of its subclasses, |
| 743 | e.g., by claiming a lock, and releasing it afterwards. |
| 744 | |
| 745 | The @|next_method| function provided to methods with the primary and |
| 746 | @|around| rôles accepts the same arguments, and returns the same type, as the |
| 747 | message, except that one or two additional arguments are inserted at the |
| 748 | front of the argument list. The first additional argument is always the |
| 749 | receiving object, @|me|. If the message accepts a variable argument suffix, |
| 750 | then the second addition argument is a @|va_list|; otherwise there is no |
| 751 | second additional argument; otherwise, In the former case, a variable |
| 752 | @|sod__master_ap| of type @|va_list| is defined, containing a separate copy |
| 753 | of the argument pointer (so the method body can process the variable argument |
| 754 | suffix itself, and still pass a fresh copy on to the next method). |
| 755 | |
| 756 | A method with the primary or @|around| rôle may use the convenience macro |
| 757 | @|CALL_NEXT_METHOD|, which takes no arguments itself, and simply calls |
| 758 | @|next_method| with appropriate arguments: the receiver @|me| pointer, the |
| 759 | argument pointer @|sod__master_ap| (if applicable), and the method's |
| 760 | arguments. If the method body has overwritten its formal arguments, then |
| 761 | @|CALL_NEXT_METHOD| will pass along the updated values, rather than the |
| 762 | original ones. |
| 763 | |
| 764 | A primary or @|around| method which invokes its @|next_method| function is |
| 765 | said to \emph{extend} the message behaviour; a method which does not invoke |
| 766 | its @|next_method| is said to \emph{override} the behaviour. Note that a |
| 767 | method may make a decision to override or extend at runtime. |
| 768 | |
| 769 | \subsubsection{Aggregating method combinations} |
| 770 | A number of other method combinations are provided. They are called |
| 771 | `aggregating' method combinations because, instead of invoking just the most |
| 772 | specific primary method, as the standard method combination does, they invoke |
| 773 | the applicable primary methods in turn and aggregate the return values from |
| 774 | each. |
| 775 | |
| 776 | The aggregating method combinations accept the same four rôles as the |
| 777 | standard method combination, and @|around|, @|before|, and @|after| methods |
| 778 | work in the same way. |
| 779 | |
| 780 | The aggregating method combinations provided are as follows. |
| 781 | \begin{description} \let\makelabel\code |
| 782 | \item[progn] The message must return @|void|. The applicable primary methods |
| 783 | are simply invoked in turn, most specific first. |
| 784 | \item[sum] The message must return a numeric type.\footnote{% |
| 785 | The Sod translator does not check this, since it doesn't have enough |
| 786 | insight into @|typedef| names.} % |
| 787 | The applicable primary methods are invoked in turn, and their return values |
| 788 | added up. The final result is the sum of the individual values. |
| 789 | \item[product] The message must return a numeric type. The applicable |
| 790 | primary methods are invoked in turn, and their return values multiplied |
| 791 | together. The final result is the product of the individual values. |
| 792 | \item[min] The message must return a scalar type. The applicable primary |
| 793 | methods are invoked in turn. The final result is the smallest of the |
| 794 | individual values. |
| 795 | \item[max] The message must return a scalar type. The applicable primary |
| 796 | methods are invoked in turn. The final result is the largest of the |
| 797 | individual values. |
| 798 | \item[and] The message must return a scalar type. The applicable primary |
| 799 | methods are invoked in turn. If any method returns zero then the final |
| 800 | result is zero and no further methods are invoked. If all of the |
| 801 | applicable primary methods return nonzero, then the final result is the |
| 802 | result of the last primary method. |
| 803 | \item[or] The message must return a scalar type. The applicable primary |
| 804 | methods are invoked in turn. If any method returns nonzero then the final |
| 805 | result is that nonzero value and no further methods are invoked. If all of |
| 806 | the applicable primary methods return zero, then the final result is zero. |
| 807 | \end{description} |
| 808 | |
| 809 | There is also a @|custom| aggregating method combination, which is described |
| 810 | in \xref{sec:fixme.custom-aggregating-method-combination}. |
| 811 | |
| 812 | |
| 813 | \subsection{Method entries} \label{sec:concepts.methods.entry} |
| 814 | |
| 815 | Each instance is associated with its direct class \fixme{direct instances} |
| 816 | |
| 817 | The effective methods for each class are determined at translation time, by |
| 818 | the Sod translator. For each effective method, one or more \emph{method |
| 819 | entry functions} are constructed. A method entry function has three |
| 820 | responsibilities. |
| 821 | \begin{itemize} |
| 822 | \item It converts the receiver pointer to the correct type. Method entry |
| 823 | functions can perform these conversions extremely efficiently: there are |
| 824 | separate method entries for each chain of each class which can receive a |
| 825 | message, so method entry functions are in the privileged situation of |
| 826 | knowing the \emph{exact} class of the receiving object. |
| 827 | \item If the message accepts a variable-length argument tail, then two method |
| 828 | entry functions are created for each chain of each class: one receives a |
| 829 | variable-length argument tail, as intended, and captures it in a @|va_list| |
| 830 | object; the other accepts an argument of type @|va_list| in place of the |
| 831 | variable-length tail and arranges for it to be passed along to the direct |
| 832 | methods. |
| 833 | \item It invokes the effective method with the appropriate arguments. There |
| 834 | might or might not be an actual function corresponding to the effective |
| 835 | method itself: the translator may instead open-code the effective method's |
| 836 | behaviour into each method entry function; and the machinery for handling |
| 837 | `delegation chains', such as is used for @|around| methods and primary |
| 838 | methods in the standard method combination, is necessarily scattered among |
| 839 | a number of small functions. |
| 840 | \end{itemize} |
| 841 | |
| 842 | |
| 843 | \subsection{Messages with keyword arguments} |
| 844 | \label{sec:concepts.methods.keywords} |
| 845 | |
| 846 | A message or a direct method may declare that it accepts keyword arguments. |
| 847 | A message which accepts keyword arguments is called a \emph{keyword message}; |
| 848 | a direct method which accepts keyword arguments is called a \emph{keyword |
| 849 | method}. |
| 850 | |
| 851 | While method combinations may set their own rules, usually keyword methods |
| 852 | can only be defined on keyword messages, and all methods defined on a keyword |
| 853 | message must be keyword methods. The direct methods defined on a keyword |
| 854 | message may differ in the keywords they accept, both from each other, and |
| 855 | from the message. If two superclasses of some common class both define |
| 856 | keyword methods on the same message, and the methods both accept a keyword |
| 857 | argument with the same name, then these two keyword arguments must also have |
| 858 | the same type. Different applicable methods may declare keyword arguments |
| 859 | with the same name but different defaults; see below. |
| 860 | |
| 861 | The keyword arguments acceptable in a message sent to an object are the |
| 862 | keywords listed in the message definition, together with all of the keywords |
| 863 | accepted by any applicable method. There is no easy way to determine at |
| 864 | runtime whether a particular keyword is acceptable in a message to a given |
| 865 | instance. |
| 866 | |
| 867 | At runtime, a direct method which accepts one or more keyword arguments |
| 868 | receives an additional argument named @|suppliedp|. This argument is a small |
| 869 | structure. For each keyword argument named $k$ accepted by the direct |
| 870 | method, @|suppliedp| contains a one-bit-wide bitfield member of type |
| 871 | @|unsigned|, also named $k$. If a keyword argument named $k$ was passed in |
| 872 | the message, then @|suppliedp.$k$| is one, and $k$ contains the argument |
| 873 | value; otherwise @|suppliedp.$k$| is zero, and $k$ contains the default value |
| 874 | from the direct method definition if there was one, or an unspecified value |
| 875 | otherwise. |
| 876 | |
| 877 | %%%-------------------------------------------------------------------------- |
| 878 | \section{The object lifecycle} \label{sec:concepts.lifecycle} |
| 879 | |
| 880 | \subsection{Creation} \label{sec:concepts.lifecycle.birth} |
| 881 | |
| 882 | Construction of a new instance of a class involves three steps. |
| 883 | \begin{enumerate} |
| 884 | \item \emph{Allocation} arranges for there to be storage space for the |
| 885 | instance's slots and associated metadata. |
| 886 | \item \emph{Imprinting} fills in the instance's metadata, associating the |
| 887 | instance with its class. |
| 888 | \item \emph{Initialization} stores appropriate initial values in the |
| 889 | instance's slots, and maybe links it into any external data structures as |
| 890 | necessary. |
| 891 | \end{enumerate} |
| 892 | The \descref{SOD_DECL}[macro]{mac} handles constructing instances with |
| 893 | automatic storage duration (`on the stack'). Similarly, the |
| 894 | \descref{SOD_MAKE}[macro]{mac} and the \descref{sod_make}{fun} and |
| 895 | \descref{sod_makev}{fun} functions construct instances allocated from the |
| 896 | standard @|malloc| heap. Programmers can add support for other allocation |
| 897 | strategies by using the \descref{SOD_INIT}[macro]{mac} and the |
| 898 | \descref{sod_init}{fun} and \descref{sod_initv}{fun} functions, which package |
| 899 | up imprinting and initialization. |
| 900 | |
| 901 | \subsubsection{Allocation} |
| 902 | Instances of most classes (specifically including those classes defined by |
| 903 | Sod itself) can be held in any storage of sufficient size. The in-memory |
| 904 | layout of an instance of some class~$C$ is described by the type @|struct |
| 905 | $C$__ilayout|, and if the relevant class is known at compile time then the |
| 906 | best way to discover the layout size is with the @|sizeof| operator. Failing |
| 907 | that, the size required to hold an instance of $C$ is available in a slot in |
| 908 | $C$'s class object, as @|$C$__class@->cls.initsz|. |
| 909 | |
| 910 | It is not in general sufficient to declare, or otherwise allocate, an object |
| 911 | of the class type $C$. The class type only describes a single chain of the |
| 912 | object's layout. It is nearly always an error to use the class type as if it |
| 913 | is a \emph{complete type}, e.g., to declare objects or arrays of the class |
| 914 | type, or to enquire about its size or alignment requirements. |
| 915 | |
| 916 | Instance layouts may be declared as objects with automatic storage duration |
| 917 | (colloquially, `allocated on the stack') or allocated dynamically, e.g., |
| 918 | using @|malloc|. They may be included as members of structures or unions, or |
| 919 | elements of arrays. Sod's runtime system doesn't retain addresses of |
| 920 | instances, so, for example, Sod doesn't make using fancy allocators which |
| 921 | sometimes move objects around in memory any more difficult than it needs to |
| 922 | be. |
| 923 | |
| 924 | There isn't any way to discover the alignment required for a particular |
| 925 | class's instances at runtime; it's best to be conservative and assume that |
| 926 | the platform's strictest alignment requirement applies. |
| 927 | |
| 928 | The following simple function correctly allocates and returns space for an |
| 929 | instance of a class given a pointer to its class object @<cls>. |
| 930 | \begin{prog} |
| 931 | void *allocate_instance(const SodClass *cls) \\ \ind |
| 932 | \{ return malloc(cls@->cls.initsz); \} |
| 933 | \end{prog} |
| 934 | |
| 935 | \subsubsection{Imprinting} |
| 936 | Once storage has been allocated, it must be \emph{imprinted} before it can be |
| 937 | used as an instance of a class, e.g., before any messages can be sent to it. |
| 938 | |
| 939 | Imprinting an instance stores some metadata about its direct class in the |
| 940 | instance structure, so that the rest of the program (and Sod's runtime |
| 941 | library) can tell what sort of object it is, and how to use it.\footnote{% |
| 942 | Specifically, imprinting an instance's storage involves storing the |
| 943 | appropriate vtable pointers in the right places in it.} % |
| 944 | A class object's @|imprint| slot points to a function which will correctly |
| 945 | imprint storage for one of that class's instances. |
| 946 | |
| 947 | Once an instance's storage has been imprinted, it is technically possible to |
| 948 | send messages to the instance; however the instance's slots are still |
| 949 | uninitialized at this point, so the applicable methods are unlikely to do |
| 950 | much of any use unless they've been written specifically for the purpose. |
| 951 | |
| 952 | The following simple function imprints storage at address @<p> as an instance |
| 953 | of a class, given a pointer to its class object @<cls>. |
| 954 | \begin{prog} |
| 955 | void imprint_instance(const SodClass *cls, void *p) \\ \ind |
| 956 | \{ cls@->cls.imprint(p); \} |
| 957 | \end{prog} |
| 958 | |
| 959 | \subsubsection{Initialization} |
| 960 | The final step for constructing a new instance is to \emph{initialize} it, to |
| 961 | establish the necessary invariants for the instance itself and the |
| 962 | environment in which it operates. |
| 963 | |
| 964 | Details of initialization are necessarily class-specific, but typically it |
| 965 | involves setting the instance's slots to appropriate values, and possibly |
| 966 | linking it into some larger data structure to keep track of it. It is |
| 967 | possible for initialization methods to attempt to allocate resources, but |
| 968 | this must be done carefully: there is currently no way to report an error |
| 969 | from object initialization, so the object must be marked as incompletely |
| 970 | initialized, and left in a state where it will be safe to tear down later. |
| 971 | |
| 972 | Initialization is performed by sending the imprinted instance an @|init| |
| 973 | message, defined by the @|SodObject| class. This message uses a nonstandard |
| 974 | method combination which works like the standard combination, except that the |
| 975 | \emph{default behaviour}, if there is no overriding method, is to initialize |
| 976 | the instance's slots, as described below, and to invoke each superclass's |
| 977 | initialization fragments. This default behaviour may be invoked multiple |
| 978 | times if some method calls on its @|next_method| more than once, unless some |
| 979 | other method takes steps to prevent this. |
| 980 | |
| 981 | Slots are initialized in a well-defined order. |
| 982 | \begin{itemize} |
| 983 | \item Slots defined by a more specific superclass are initialized after slots |
| 984 | defined by a less specific superclass. |
| 985 | \item Slots defined by the same class are initialized in the order in which |
| 986 | their definitions appear. |
| 987 | \end{itemize} |
| 988 | |
| 989 | A class can define \emph{initialization fragments}: pieces of literal code to |
| 990 | be executed to set up a new instance. Each superclass's initialization |
| 991 | fragments are executed with @|me| bound to an instance pointer of the |
| 992 | appropriate superclass type, immediately after that superclass's slots (if |
| 993 | any) have been initialized; therefore, fragments defined by a more specific |
| 994 | superclass are executed after fragments defined by a less specific |
| 995 | superclass. A class may define more than one initialization fragment: the |
| 996 | fragments are executed in the order in which they appear in the class |
| 997 | definition. It is possible for an initialization fragment to use @|return| |
| 998 | or @|goto| for special control-flow effects, but this is not likely to be a |
| 999 | good idea. |
| 1000 | |
| 1001 | The @|init| message accepts keyword arguments |
| 1002 | (\xref{sec:concepts.methods.keywords}). The set of acceptable keywords is |
| 1003 | determined by the applicable methods as usual, but also by the |
| 1004 | \emph{initargs} defined by the receiving instance's class and its |
| 1005 | superclasses, which are made available to slot initializers and |
| 1006 | initialization fragments. |
| 1007 | |
| 1008 | There are two kinds of initarg definitions. \emph{User initargs} are defined |
| 1009 | by an explicit @|initarg| item appearing in a class definition: the item |
| 1010 | defines a name, type, and (optionally) a default value for the initarg. |
| 1011 | \emph{Slot initargs} are defined by attaching an @|initarg| property to a |
| 1012 | slot or slot initializer item: the property's value determines the initarg's |
| 1013 | name, while the type is taken from the underlying slot type; slot initargs do |
| 1014 | not have default values. Both kinds define a \emph{direct initarg} for the |
| 1015 | containing class. |
| 1016 | |
| 1017 | Initargs are inherited. The \emph{applicable} direct initargs for an @|init| |
| 1018 | effective method are those defined by the receiving object's class, and all |
| 1019 | of its superclasses. Applicable direct initargs with the same name are |
| 1020 | merged to form \emph{effective initargs}. An error is reported if two |
| 1021 | applicable direct initargs have the same name but different types. The |
| 1022 | default value of an effective initarg is taken from the most specific |
| 1023 | applicable direct initarg which specifies a defalt value; if no applicable |
| 1024 | direct initarg specifies a default value then the effective initarg has no |
| 1025 | default. |
| 1026 | |
| 1027 | All initarg values are made available at runtime to user code -- |
| 1028 | initialization fragments and slot initializer expressions -- through local |
| 1029 | variables and a @|suppliedp| structure, as in a direct method |
| 1030 | (\xref{sec:concepts.methods.keywords}). Furthermore, slot initarg |
| 1031 | definitions influence the initialization of slots. |
| 1032 | |
| 1033 | The process for deciding how to initialize a particular slot works as |
| 1034 | follows. |
| 1035 | \begin{enumerate} |
| 1036 | \item If there are any slot initargs defined on the slot, or any of its slot |
| 1037 | initializers, \emph{and} the sender supplied a value for one or more of the |
| 1038 | corresponding effective initargs, then the value of the most specific slot |
| 1039 | initarg is stored in the slot. |
| 1040 | \item Otherwise, if there are any slot initializers defined which include an |
| 1041 | initializer expression, then the initializer expression from the most |
| 1042 | specific such slot initializer is evaluated and its value stored in the |
| 1043 | slot. |
| 1044 | \item Otherwise, the slot is left uninitialized. |
| 1045 | \end{enumerate} |
| 1046 | Note that the default values (if any) of effective initargs do \emph{not} |
| 1047 | affect this procedure. |
| 1048 | |
| 1049 | |
| 1050 | \subsection{Destruction} |
| 1051 | \label{sec:concepts.lifecycle.death} |
| 1052 | |
| 1053 | Destruction of an instance, when it is no longer required, consists of two |
| 1054 | steps. |
| 1055 | \begin{enumerate} |
| 1056 | \item \emph{Teardown} releases any resources held by the instance and |
| 1057 | disentangles it from any external data structures. |
| 1058 | \item \emph{Deallocation} releases the memory used to store the instance so |
| 1059 | that it can be reused. |
| 1060 | \end{enumerate} |
| 1061 | Teardown alone, for objects which require special deallocation, or for which |
| 1062 | deallocation occurs automatically (e.g., instances with automatic storage |
| 1063 | duration, or instances whose storage will be garbage-collected), is performed |
| 1064 | using the \descref{sod_teardown}[function]{fun}. Destruction of instances |
| 1065 | allocated from the standard @|malloc| heap is done using the |
| 1066 | \descref{sod_destroy}[function]{fun}. |
| 1067 | |
| 1068 | \subsubsection{Teardown} |
| 1069 | Details of teardown are necessarily class-specific, but typically it |
| 1070 | involves releasing resources held by the instance, and disentangling it from |
| 1071 | any data structures it might be linked into. |
| 1072 | |
| 1073 | Teardown is performed by sending the instance the @|teardown| message, |
| 1074 | defined by the @|SodObject| class. The message returns an integer, used as a |
| 1075 | boolean flag. If the message returns zero, then the instance's storage |
| 1076 | should be deallocated. If the message returns nonzero, then it is safe for |
| 1077 | the caller to forget about instance, but should not deallocate its storage. |
| 1078 | This is \emph{not} an error return: if some teardown method fails then the |
| 1079 | program may be in an inconsistent state and should not continue. |
| 1080 | |
| 1081 | This simple protocol can be used, for example, to implement a reference |
| 1082 | counting system, as follows. |
| 1083 | \begin{prog} |
| 1084 | [nick = ref] \\ |
| 1085 | class ReferenceCountedObject: SodObject \{ \\ \ind |
| 1086 | unsigned nref = 1; \\- |
| 1087 | void inc() \{ me@->ref.nref++; \} \\- |
| 1088 | [role = around] \\ |
| 1089 | int obj.teardown() \\ |
| 1090 | \{ \\ \ind |
| 1091 | if (--\,--me@->ref.nref) return (1); \\ |
| 1092 | else return (CALL_NEXT_METHOD); \-\\ |
| 1093 | \} \-\\ |
| 1094 | \} |
| 1095 | \end{prog} |
| 1096 | |
| 1097 | The @|teardown| message uses a nonstandard method combination which works |
| 1098 | like the standard combination, except that the \emph{default behaviour}, if |
| 1099 | there is no overriding method, is to execute the superclass's teardown |
| 1100 | fragments, and to return zero. This default behaviour may be invoked |
| 1101 | multiple times if some method calls on its @|next_method| more than once, |
| 1102 | unless some other method takes steps to prevent this. |
| 1103 | |
| 1104 | A class can define \emph{teardown fragments}: pieces of literal code to be |
| 1105 | executed to shut down an instance. Each superclass's teardown fragments are |
| 1106 | executed with @|me| bound to an instance pointer of the appropriate |
| 1107 | superclass type; fragments defined by a more specific superclass are executed |
| 1108 | before fragments defined by a less specific superclass. A class may define |
| 1109 | more than one teardown fragment: the fragments are executed in the order in |
| 1110 | which they appear in the class definition. It is possible for an |
| 1111 | initialization fragment to use @|return| or @|goto| for special control-flow |
| 1112 | effects, but this is not likely to be a good idea. Similarly, it's probably |
| 1113 | a better idea to use an @|around| method to influence the return value than |
| 1114 | to write an explicit @|return| statement in a teardown fragment. |
| 1115 | |
| 1116 | \subsubsection{Deallocation} |
| 1117 | The details of instance deallocation are obviously specific to the allocation |
| 1118 | strategy used by the instance, and this is often orthogonal from the object's |
| 1119 | class. |
| 1120 | |
| 1121 | The code which makes the decision to destroy an object may often not be aware |
| 1122 | of the object's direct class. Low-level details of deallocation often |
| 1123 | require the proper base address of the instance's storage, which can be |
| 1124 | determined using the \descref{SOD_INSTBASE}[macro]{mac}. |
| 1125 | |
| 1126 | %%%-------------------------------------------------------------------------- |
| 1127 | \section{Metaclasses} \label{sec:concepts.metaclasses} |
| 1128 | |
| 1129 | In Sod, every object is an instance of some class, and -- unlike, say, |
| 1130 | \Cplusplus\ -- classes are proper objects. It follows that, in Sod, every |
| 1131 | class~$C$ is itself an instance of some class~$M$, which is called $C$'s |
| 1132 | \emph{metaclass}. Metaclass instances are usually constructed statically, at |
| 1133 | compile time, and marked read-only. |
| 1134 | |
| 1135 | As an added complication, Sod classes, and other metaobjects such as |
| 1136 | messages, methods, slots and so on, also have classes \emph{at translation |
| 1137 | time}. These translation-time metaclasses are not Sod classes; they are CLOS |
| 1138 | classes, implemented in Common Lisp. |
| 1139 | |
| 1140 | |
| 1141 | \subsection{Runtime metaclasses} |
| 1142 | \label{sec:concepts.metaclasses.runtime} |
| 1143 | |
| 1144 | Like other classes, metaclasses can declare messages, and define slots and |
| 1145 | methods. Slots defined by the metaclass are called \emph{class slots}, as |
| 1146 | opposed to \emph{instance slots}. Similarly, messages and methods defined by |
| 1147 | the metaclass are termed \emph{class messages} and \emph{class methods} |
| 1148 | respectively, though these are used much less frequently. |
| 1149 | |
| 1150 | \subsubsection{The braid} |
| 1151 | Every object is an instance of some class. There are only finitely many |
| 1152 | classes. |
| 1153 | |
| 1154 | \begin{figure} |
| 1155 | \centering |
| 1156 | \begin{tikzpicture} |
| 1157 | \node[lit] (obj) {SodObject}; |
| 1158 | \node[lit] (cls) [right=10mm of obj] {SodClass}; |
| 1159 | \draw [->, dashed] (obj) to[bend right] (cls); |
| 1160 | \draw [->] (cls) to[bend right] (obj); |
| 1161 | \draw [->, dashed] (cls) to[loop right] (cls); |
| 1162 | \end{tikzpicture} |
| 1163 | \qquad |
| 1164 | \fbox{\ \begin{tikzpicture} |
| 1165 | \node (subclass) {subclass of}; |
| 1166 | \node (instance) [below=\jot of subclass] {instance of}; |
| 1167 | \draw [->] ($(subclass.west) - (10mm, 0)$) -- ++(8mm, 0); |
| 1168 | \draw [->, dashed] ($(instance.west) - (10mm, 0)$) -- ++(8mm, 0); |
| 1169 | \end{tikzpicture}} |
| 1170 | \caption{The Sod braid} \label{fig:concepts.metaclasses.braid} |
| 1171 | \end{figure} |
| 1172 | |
| 1173 | Consider the directed graph whose nodes are classes, and where there is an |
| 1174 | arc from $C$ to $D$ if and only if $C$ is an instance of $D$. There are only |
| 1175 | finitely many nodes. Every node has an arc leaving it, because every object |
| 1176 | -- and hence every class -- is an instance of some class. Therefore this |
| 1177 | graph must contain at least one cycle. |
| 1178 | |
| 1179 | In Sod, this situation is resolved in the simplest manner possible: |
| 1180 | @|SodClass| is the only predefined metaclass, and it is an instance of |
| 1181 | itself. The only other predefined class is @|SodObject|, which is also an |
| 1182 | instance of @|SodClass|. There is exactly one root class, namely |
| 1183 | @|SodObject|; consequently, @|SodClass| is a direct subclass of @|SodObject|. |
| 1184 | |
| 1185 | \Xref{fig:concepts.metaclasses.braid} shows a diagram of this situation. |
| 1186 | |
| 1187 | \subsubsection{Class slots and initializers} |
| 1188 | Instance initializers were described in \xref{sec:concepts.classes.slots}. A |
| 1189 | class can also define \emph{class initializers}, which provide values for |
| 1190 | slots defined by its metaclass. The initial value for a class slot is |
| 1191 | determined as follows. |
| 1192 | \begin{itemize} |
| 1193 | \item Nonstandard slot classes may be initialized by custom Lisp code. For |
| 1194 | example, all of the slots defined by @|SodClass| are of this kind. User |
| 1195 | initializers are not permitted for such slots. |
| 1196 | \item If the class or any of its superclasses defines a class initializer for |
| 1197 | the slot, then the class initializer defined by the most specific such |
| 1198 | superclass is used. |
| 1199 | \item Otherwise, if the metaclass or one of its superclasses defines an |
| 1200 | instance initializer, then the instance initializer defined by he most |
| 1201 | specific such class is used. |
| 1202 | \item Otherwise there is no initializer, and an error will be reported. |
| 1203 | \end{itemize} |
| 1204 | Initializers for class slots must be constant expressions (for scalar slots) |
| 1205 | or aggregate initializers containing constant expressions. |
| 1206 | |
| 1207 | \subsubsection{Metaclass selection and consistency} |
| 1208 | Sod enforces a \emph{metaclass consistency rule}: if $C$ has metaclass $M$, |
| 1209 | then any subclass $C$ must have a metaclass which is a subclass of $M$. |
| 1210 | |
| 1211 | The definition of a new class can name the new class's metaclass explicitly, |
| 1212 | by defining a @|metaclass| property; the Sod translator will verify that the |
| 1213 | choice of metaclass is acceptable. |
| 1214 | |
| 1215 | If no @|metaclass| property is given, then the translator will select a |
| 1216 | default metaclass as follows. Let $C_1$, $C_2$, \dots, $C_n$ be the direct |
| 1217 | superclasses of the new class, and let $M_1$, $M_2$, \dots, $M_n$ be their |
| 1218 | respective metaclasses (not necessarily distinct). If there exists exactly |
| 1219 | one minimal metaclass $M_i$, i.e., there exists an $i$, with $1 \le i \le n$, |
| 1220 | such that $M_i$ is a subclass of every $M_j$, for $1 \le j \le n$, then $M_i$ |
| 1221 | is selected as the new class's metaclass. Otherwise the situation is |
| 1222 | ambiguous and an error will be reported. Usually, the ambiguity can be |
| 1223 | resolved satisfactorily by defining a new class $M^*$ as a direct subclass of |
| 1224 | the minimal $M_j$. |
| 1225 | |
| 1226 | |
| 1227 | \subsection{Translation-time metaobjects} |
| 1228 | \label{sec:concepts.metaclasses.compile-time} |
| 1229 | |
| 1230 | |
| 1231 | |
| 1232 | \fixme{unwritten} |
| 1233 | |
| 1234 | %%%-------------------------------------------------------------------------- |
| 1235 | \section{Compatibility considerations} \label{sec:concepts.compatibility} |
| 1236 | |
| 1237 | Sod doesn't make source-level compatibility especially difficult. As long as |
| 1238 | classes, slots, and messages don't change names or dissappear, and slots and |
| 1239 | messages retain their approximate types, everything will be fine. |
| 1240 | |
| 1241 | Binary compatibility is much more difficult. Unfortunately, Sod classes have |
| 1242 | rather fragile binary interfaces.\footnote{% |
| 1243 | Research suggestion: investigate alternative instance and vtable layouts |
| 1244 | which improve binary compatibility, probably at the expense of instance |
| 1245 | compactness, and efficiency of slot access and message sending. There may |
| 1246 | be interesting trade-offs to be made.} % |
| 1247 | |
| 1248 | If instances are allocated \fixme{incomplete} |
| 1249 | |
| 1250 | %%%----- That's all, folks -------------------------------------------------- |
| 1251 | |
| 1252 | %%% Local variables: |
| 1253 | %%% mode: LaTeX |
| 1254 | %%% TeX-master: "sod.tex" |
| 1255 | %%% TeX-PDF-mode: t |
| 1256 | %%% End: |