Commit | Line | Data |
---|---|---|
1f7d590d MW |
1 | %%% -*-latex-*- |
2 | %%% | |
3 | %%% Conceptual background | |
4 | %%% | |
5 | %%% (c) 2015 Straylight/Edgeware | |
6 | %%% | |
7 | ||
8 | %%%----- Licensing notice --------------------------------------------------- | |
9 | %%% | |
e0808c47 | 10 | %%% This file is part of the Sensible Object Design, an object system for C. |
1f7d590d MW |
11 | %%% |
12 | %%% SOD is free software; you can redistribute it and/or modify | |
13 | %%% it under the terms of the GNU General Public License as published by | |
14 | %%% the Free Software Foundation; either version 2 of the License, or | |
15 | %%% (at your option) any later version. | |
16 | %%% | |
17 | %%% SOD is distributed in the hope that it will be useful, | |
18 | %%% but WITHOUT ANY WARRANTY; without even the implied warranty of | |
19 | %%% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | |
20 | %%% GNU General Public License for more details. | |
21 | %%% | |
22 | %%% You should have received a copy of the GNU General Public License | |
23 | %%% along with SOD; if not, write to the Free Software Foundation, | |
24 | %%% Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. | |
25 | ||
3cc520db | 26 | \chapter{Concepts} \label{ch:concepts} |
1f7d590d | 27 | |
3cc520db MW |
28 | %%%-------------------------------------------------------------------------- |
29 | \section{Operational model} \label{sec:concepts.model} | |
1f7d590d | 30 | |
3cc520db MW |
31 | The Sod translator runs as a preprocessor, similar in nature to the |
32 | traditional Unix \man{lex}{1} and \man{yacc}{1} tools. The translator reads | |
33 | a \emph{module} file containing class definitions and other information, and | |
34 | writes C~source and header files. The source files contain function | |
35 | definitions and static tables which are fed directly to a C~compiler; the | |
36 | header files contain declarations for functions and data structures, and are | |
37 | included by source files -- whether hand-written or generated by Sod -- which | |
38 | makes use of the classes defined in the module. | |
1f7d590d | 39 | |
3cc520db MW |
40 | Sod is not like \Cplusplus: it makes no attempt to `enhance' the C language |
41 | itself. Sod module files describe classes, messages, methods, slots, and | |
42 | other kinds of object-system things, and some of these descriptions need to | |
43 | contain C code fragments, but this code is entirely uninterpreted by the Sod | |
44 | translator.\footnote{% | |
45 | As long as a code fragment broadly follows C's lexical rules, and properly | |
46 | matches parentheses, brackets, and braces, the Sod translator will copy it | |
47 | into its output unchanged. It might, in fact, be some other kind of C-like | |
48 | language, such as Objective~C or \Cplusplus. Or maybe even | |
49 | Objective~\Cplusplus, because if having an object system is good, then | |
50 | having three must be really awesome.} % | |
1f7d590d | 51 | |
3cc520db MW |
52 | The Sod translator is not a closed system. It is written in Common Lisp, and |
53 | can load extension modules which add new input syntax, output formats, or | |
54 | altered behaviour. The interface for writing such extensions is described in | |
55 | \xref{p:lisp}. Extensions can change almost all details of the Sod object | |
56 | system, so the material in this manual must be read with this in mind: this | |
57 | manual describes the base system as provided in the distribution. | |
58 | ||
59 | %%%-------------------------------------------------------------------------- | |
60 | \section{Modules} \label{sec:concepts.modules} | |
61 | ||
62 | A \emph{module} is the top-level syntactic unit of input to the Sod | |
63 | translator. As described above, given an input module, the translator | |
64 | generates C source and header files. | |
65 | ||
66 | A module can \emph{import} other modules. This makes the type names and | |
67 | classes defined in those other modules available to class definitions in the | |
68 | importing module. Sod's module system is intentionally very simple. There | |
69 | are no private declarations or attempts to hide things. | |
70 | ||
71 | As well as importing existing modules, a module can include a number of | |
72 | different kinds of \emph{items}: | |
73 | \begin{itemize} | |
74 | \item \emph{class definitions} describe new classes, possibly in terms of | |
75 | existing classes; | |
76 | \item \emph{type name declarations} introduce new type names to Sod's | |
77 | parser;\footnote{% | |
78 | This is unfortunately necessary because C syntax, upon which Sod's input | |
79 | language is based for obvious reasons, needs to treat type names | |
80 | differently from other kinds of identifiers.} % | |
81 | and | |
82 | \item \emph{code fragments} contain literal C code to be dropped into an | |
83 | appropriate place in an output file. | |
84 | \end{itemize} | |
85 | Each kind of item, and, indeed, a module as a whole, can have a collection of | |
86 | \emph{properties} associated with it. A property has a \emph{name} and a | |
87 | \emph{value}. Properties are an open-ended way of attaching additional | |
88 | information to module items, so extensions can make use of them without | |
89 | having to implement additional syntax. | |
90 | ||
91 | %%%-------------------------------------------------------------------------- | |
92 | \section{Classes, instances, and slots} \label{sec:concepts.classes} | |
93 | ||
94 | For the most part, Sod takes a fairly traditional view of what it means to be | |
95 | an object system. | |
96 | ||
97 | An \emph{object} maintains \emph{state} and exhibits \emph{behaviour}. An | |
98 | object's state is maintained in named \emph{slots}, each of which can store a | |
99 | C value of an appropriate (scalar or aggregate) type. An object's behaviour | |
100 | is stimulated by sending it \emph{messages}. A message has a name, and may | |
101 | carry a number of arguments, which are C values; sending a message may result | |
102 | in the state of receiving object (or other objects) being changed, and a C | |
103 | value being returned to the sender. | |
104 | ||
105 | Every object is a (direct) instance of some \emph{class}. The class | |
106 | determines which slots its instances have, which messages its instances can | |
107 | be sent, and which methods are invoked when those messages are received. The | |
108 | Sod translator's main job is to read class definitions and convert them into | |
109 | appropriate C declarations, tables, and functions. An object cannot | |
110 | (usually) change its direct class, and the direct class of an object is not | |
111 | affected by, for example, the static type of a pointer to it. | |
112 | ||
0a2d4b68 | 113 | |
3cc520db MW |
114 | \subsection{Superclasses and inheritance} |
115 | \label{sec:concepts.classes.inherit} | |
116 | ||
117 | \subsubsection{Class relationships} | |
118 | Each class has zero or more \emph{direct superclasses}. | |
119 | ||
120 | A class with no direct superclasses is called a \emph{root class}. The Sod | |
121 | runtime library includes a root class named @|SodObject|; making new root | |
122 | classes is somewhat tricky, and won't be discussed further here. | |
123 | ||
124 | Classes can have more than one direct superclass, i.e., Sod supports | |
125 | \emph{multiple inheritance}. A Sod class definition for a class~$C$ lists | |
126 | the direct superclasses of $C$ in a particular order. This order is called | |
127 | the \emph{local precedence order} of $C$, and the list which consists of $C$ | |
128 | follows by $C$'s direct superclasses in local precedence order is called the | |
129 | $C$'s \emph{local precedence list}. | |
130 | ||
131 | The multiple inheritance in Sod works similarly to multiple inheritance in | |
132 | Lisp-like languages, such as Common Lisp, EuLisp, Dylan, and Python, which is | |
133 | very different from how multiple inheritance works in \Cplusplus.\footnote{% | |
134 | The latter can be summarized as `badly'. By default in \Cplusplus, an | |
135 | instance receives an additional copy of superclass's state for each path | |
136 | through the class graph from the instance's direct class to that | |
137 | superclass, though this behaviour can be overridden by declaring | |
138 | superclasses to be @|virtual|. Also, \Cplusplus\ offers only trivial | |
139 | method combination (\xref{sec:concepts.methods}), leaving programmers to | |
140 | deal with delegation manually and (usually) statically.} % | |
141 | ||
142 | If $C$ is a class, then the \emph{superclasses} of $C$ are | |
143 | \begin{itemize} | |
144 | \item $C$ itself, and | |
145 | \item the superclasses of each of $C$'s direct superclasses. | |
146 | \end{itemize} | |
147 | The \emph{proper superclasses} of a class $C$ are the superclasses of $C$ | |
148 | except for $C$ itself. If a class $B$ is a (direct, proper) superclass of | |
149 | $C$, then $C$ is a \emph{(direct, proper) subclass} of $B$. If $C$ is a root | |
150 | class then the only superclass of $C$ is $C$ itself, and $C$ has no proper | |
151 | superclasses. | |
152 | ||
153 | If an object is a direct instance of class~$C$ then the object is also an | |
154 | (indirect) instance of every superclass of $C$. | |
155 | ||
156 | If $C$ has a proper superclass $B$, then $B$ is not allowed to have $C$ has a | |
157 | direct superclass. In different terms, if we construct a graph, whose | |
158 | vertices are classes, and draw an edge from each class to each of its direct | |
159 | superclasses, then this graph must be acyclic. In yet other terms, the `is a | |
160 | superclass of' relation is a partial order on classes. | |
161 | ||
162 | \subsubsection{The class precedence list} | |
163 | This partial order is not quite sufficient for our purposes. For each class | |
164 | $C$, we shall need to extend it into a total order on $C$'s superclasses. | |
165 | This calculation is called \emph{superclass linearization}, and the result is | |
166 | a \emph{class precedence list}, which lists each of $C$'s superclasses | |
167 | exactly once. If a superclass $B$ precedes (resp.\ follows) some other | |
168 | superclass $A$ in $C$'s class precedence list, then we say that $B$ is a more | |
169 | (resp.\ less) \emph{specific} superclass of $C$ than $A$ is. | |
170 | ||
171 | The superclass linearization algorithm isn't fixed, and extensions to the | |
172 | translator can introduce new linearizations for special effects, but the | |
173 | following properties are expected to hold. | |
174 | \begin{itemize} | |
175 | \item The first class in $C$'s class precedence list is $C$ itself; i.e., | |
176 | $C$ is always its own most specific superclass. | |
177 | \item If $A$ and $B$ are both superclasses of $C$, and $A$ is a proper | |
178 | superclass of $B$ then $A$ appears after $B$ in $C$'s class precedence | |
179 | list, i.e., $B$ is a more specific superclass of $C$ than $A$ is. | |
180 | \end{itemize} | |
181 | The default linearization algorithm used in Sod is the \emph{C3} algorithm, | |
182 | which has a number of good properties described in~\cite{FIXME:C3}. | |
183 | It works as follows. | |
184 | \begin{itemize} | |
185 | \item A \emph{merge} of some number of input lists is a single list | |
186 | containing each item that is in any of the input lists exactly once, and no | |
187 | other items; if an item $x$ appears before an item $y$ in any input list, | |
188 | then $x$ also appears before $y$ in the merge. If a collection of lists | |
189 | have no merge then they are said to be \emph{inconsistent}. | |
190 | \item The class precedence list of a class $C$ is a merge of the local | |
191 | precedence list of $C$ together with the class precedence lists of each of | |
192 | $C$'s direct superclasses. | |
193 | \item If there are no such merges, then the definition of $C$ is invalid. | |
194 | \item Suppose that there are multiple candidate merges. Consider the | |
195 | earliest position in these candidate merges at which they disagree. The | |
196 | \emph{candidate classes} at this position are the classes appearing at this | |
197 | position in the candidate merges. Each candidate class must be a | |
781a8fbd | 198 | superclass of distinct direct superclasses of $C$, since otherwise the |
3cc520db MW |
199 | candidates would be ordered by their common subclass's class precedence |
200 | list. The class precedence list contains, at this position, that candidate | |
201 | class whose subclass appears earliest in $C$'s local precedence order. | |
202 | \end{itemize} | |
203 | ||
204 | \subsubsection{Class links and chains} | |
205 | The definition for a class $C$ may distinguish one of its proper superclasses | |
206 | as being the \emph{link superclass} for class $C$. Not every class need have | |
207 | a link superclass, and the link superclass of a class $C$, if it exists, need | |
208 | not be a direct superclass of $C$. | |
209 | ||
210 | Superclass links must obey the following rule: if $C$ is a class, then there | |
781a8fbd MW |
211 | must be no three superclasses $X$, $Y$ and~$Z$ of $C$ such that $Z$ is the |
212 | link superclass of both $X$ and $Y$. As a consequence of this rule, the | |
3cc520db MW |
213 | superclasses of $C$ can be partitioned into linear \emph{chains}, such that |
214 | superclasses $A$ and $B$ are in the same chain if and only if one can trace a | |
215 | path from $A$ to $B$ by following superclass links, or \emph{vice versa}. | |
216 | ||
217 | Since a class links only to one of its proper superclasses, the classes in a | |
218 | chain are naturally ordered from most- to least-specific. The least specific | |
219 | class in a chain is called the \emph{chain head}; the most specific class is | |
220 | the \emph{chain tail}. Chains are often named after their chain head | |
221 | classes. | |
222 | ||
223 | \subsection{Names} | |
224 | \label{sec:concepts.classes.names} | |
225 | ||
226 | Classes have a number of other attributes: | |
227 | \begin{itemize} | |
228 | \item A \emph{name}, which is a C identifier. Class names must be globally | |
229 | unique. The class name is used in the names of a number of associated | |
230 | definitions, to be described later. | |
231 | \item A \emph{nickname}, which is also a C identifier. Unlike names, | |
232 | nicknames are not required to be globally unique. If $C$ is any class, | |
233 | then all the superclasses of $C$ must have distinct nicknames. | |
234 | \end{itemize} | |
235 | ||
0a2d4b68 | 236 | |
3cc520db MW |
237 | \subsection{Slots} \label{sec:concepts.classes.slots} |
238 | ||
239 | Each class defines a number of \emph{slots}. Much like a structure member, a | |
240 | slot has a \emph{name}, which is a C identifier, and a \emph{type}. Unlike | |
241 | many other object systems, different superclasses of a class $C$ can define | |
242 | slots with the same name without ambiguity, since slot references are always | |
243 | qualified by the defining class's nickname. | |
244 | ||
245 | \subsubsection{Slot initializers} | |
246 | As well as defining slot names and types, a class can also associate an | |
247 | \emph{initial value} with each slot defined by itself or one of its | |
248 | subclasses. A class $C$ provides an \emph{initialization function} (see | |
d24d47f5 MW |
249 | \xref{sec:concepts.lifecycle.birth}, and \xref{sec:structures.root.sodclass}) |
250 | which sets the slots of a \emph{direct} instance of the class to the correct | |
3cc520db MW |
251 | initial values. If several of $C$'s superclasses define initializers for the |
252 | same slot then the initializer from the most specific such class is used. If | |
253 | none of $C$'s superclasses define an initializer for some slot then that slot | |
781a8fbd | 254 | will be left uninitialized. |
3cc520db MW |
255 | |
256 | The initializer for a slot with scalar type may be any C expression. The | |
257 | initializer for a slot with aggregate type must contain only constant | |
258 | expressions if the generated code is expected to be processed by a | |
259 | implementation of C89. Initializers will be evaluated once each time an | |
260 | instance is initialized. | |
261 | ||
0a2d4b68 | 262 | |
3cc520db MW |
263 | \subsection{C language integration} \label{sec:concepts.classes.c} |
264 | ||
265 | For each class~$C$, the Sod translator defines a C type, the \emph{class | |
266 | type}, with the same name. This is the usual type used when considering an | |
267 | object as an instance of class~$C$. No entire object will normally have a | |
268 | class type,\footnote{% | |
269 | In general, a class type only captures the structure of one of the | |
270 | superclass chains of an instance. A full instance layout contains multiple | |
271 | chains. See \xref{sec:structures.layout} for the full details.} % | |
272 | so access to instances is almost always via pointers. | |
273 | ||
274 | \subsubsection{Access to slots} | |
275 | The class type for a class~$C$ is actually a structure. It contains one | |
276 | member for each class in $C$'s superclass chain, named with that class's | |
277 | nickname. Each of these members is also a structure, containing the | |
278 | corresponding class's slots, one member per slot. There's nothing special | |
279 | about these slot members: C code can access them in the usual way. | |
280 | ||
281 | For example, if @|MyClass| has the nickname @|mine|, and defines a slot @|x| | |
282 | of type @|int|, then the simple function | |
283 | \begin{prog} | |
c18d6aba | 284 | int get_x(MyClass *m) \{ return (m@->mine.x); \} |
3cc520db MW |
285 | \end{prog} |
286 | will extract the value of @|x| from an instance of @|MyClass|. | |
287 | ||
288 | All of this means that there's no such thing as `private' or `protected' | |
289 | slots. If you want to hide implementation details, the best approach is to | |
290 | stash them in a dynamically allocated private structure, and leave a pointer | |
291 | to it in a slot. (This will also help preserve binary compatibility, because | |
292 | the private structure can grow more members as needed. See | |
293 | \xref{sec:fixme.compatibility} for more details. | |
294 | ||
295 | \subsubsection{Class objects} | |
296 | In Sod's object system, classes are objects too. Therefore classes are | |
297 | themselves instances; the class of a class is called a \emph{metaclass}. The | |
298 | consequences of this are explored in \xref{sec:concepts.metaclasses}. The | |
299 | \emph{class object} has the same name as the class, suffixed with | |
300 | `@|__class|'\footnote{% | |
301 | This is not quite true. @|$C$__class| is actually a macro. See | |
302 | \xref{sec:structures.layout.additional} for the gory details.} % | |
303 | and its type is usually @|SodClass|; @|SodClass|'s nickname is @|cls|. | |
304 | ||
305 | A class object's slots contain or point to useful information, tables and | |
306 | functions for working with that class's instances. (The @|SodClass| class | |
307 | doesn't define any messages, so it doesn't have any methods. In Sod, a class | |
308 | slot containing a function pointer is not at all the same thing as a method.) | |
309 | ||
3cc520db MW |
310 | \subsubsection{Conversions} |
311 | Suppose one has a value of type pointer to class type of some class~$C$, and | |
312 | wants to convert it to a pointer to class type of some other class~$B$. | |
313 | There are three main cases to distinguish. | |
314 | \begin{itemize} | |
315 | \item If $B$ is a superclass of~$C$, in the same chain, then the conversion | |
316 | is an \emph{in-chain upcast}. The conversion can be performed using the | |
317 | appropriate generated upcast macro (see below), or by simply casting the | |
318 | pointer, using C's usual cast operator (or the \Cplusplus\ @|static_cast<>| | |
319 | operator). | |
320 | \item If $B$ is a superclass of~$C$, in a different chain, then the | |
321 | conversion is a \emph{cross-chain upcast}. The conversion is more than a | |
322 | simple type change: the pointer value must be adjusted. If the direct | |
323 | class of the instance in question is not known, the conversion will require | |
324 | a lookup at runtime to find the appropriate offset by which to adjust the | |
325 | pointer. The conversion can be performed using the appropriate generated | |
326 | upcast macro (see below); the general case is handled by the macro | |
58f9b400 | 327 | \descref{SOD_XCHAIN}{mac}. |
3cc520db MW |
328 | \item If $B$ is a subclass of~$C$ then the conversion is an \emph{upcast}; |
329 | otherwise the conversion is a~\emph{cross-cast}. In either case, the | |
330 | conversion can fail: the object in question might not be an instance of~$B$ | |
58f9b400 MW |
331 | at all. The macro \descref{SOD_CONVERT}{mac} and the function |
332 | \descref{sod_convert}{fun} perform general conversions. They return a null | |
781a8fbd MW |
333 | pointer if the conversion fails. (There are therefore your analogue to the |
334 | \Cplusplus @|dynamic_cast<>| operator.) | |
3cc520db MW |
335 | \end{itemize} |
336 | The Sod translator generates macros for performing both in-chain and | |
337 | cross-chain upcasts. For each class~$C$, and each proper superclass~$B$ | |
338 | of~$C$, a macro is defined: given an argument of type pointer to class type | |
339 | of~$C$, it returns a pointer to the same instance, only with type pointer to | |
340 | class type of~$B$, adjusted as necessary in the case of a cross-chain | |
341 | conversion. The macro is named by concatenating | |
342 | \begin{itemize} | |
343 | \item the name of class~$C$, in upper case, | |
344 | \item the characters `@|__CONV_|', and | |
345 | \item the nickname of class~$B$, in upper case; | |
346 | \end{itemize} | |
347 | e.g., if $C$ is named @|MyClass|, and $B$'s name is @|SuperClass| with | |
348 | nickname @|super|, then the macro @|MYCLASS__CONV_SUPER| converts a | |
349 | @|MyClass~*| to a @|SuperClass~*|. See | |
350 | \xref{sec:structures.layout.additional} for the formal description. | |
351 | ||
352 | %%%-------------------------------------------------------------------------- | |
9e91c8e7 MW |
353 | \section{Keyword arguments} \label{sec:concepts.keywords} |
354 | ||
355 | In standard C, the actual arguments provided to a function are matched up | |
356 | with the formal arguments given in the function definition according to their | |
357 | ordering in a list. Unless the (rather cumbersome) machinery for dealing | |
358 | with variable-length argument tails (@|<stdarg.h>|) is used, exactly the | |
359 | correct number of arguments must be supplied, and in the correct order. | |
360 | ||
361 | A \emph{keyword argument} is matched by its distinctive \emph{name}, rather | |
362 | than by its position in a list. Keyword arguments may be \emph{omitted}, | |
363 | causing some default behaviour by the function. A function can detect | |
364 | whether a particular keyword argument was supplied: so the default behaviour | |
365 | need not be the same as that caused by any specific value of the argument. | |
366 | ||
367 | Keyword arguments can be provided in three ways. | |
368 | \begin{enumerate} | |
369 | \item Directly, as a variable-length argument tail, consisting (for the most | |
370 | part) of alternating keyword names, as pointers to null-terminated strings, | |
371 | and argument values, and terminated by a null pointer. This is somewhat | |
372 | error-prone, and the support library defines some macros which help ensure | |
373 | that keyword argument lists are well formed. | |
374 | \item Indirectly, through a @|va_list| object capturing a variable-length | |
375 | argument tail passed to some other function. Such indirect argument tails | |
376 | have the same structure as the direct argument tails described above. | |
377 | Because @|va_list| objects are hard to copy, the keyword-argument support | |
378 | library consistently passes @|va_list| objects \emph{by reference} | |
379 | throughout its programming interface. | |
380 | \item Indirectly, through a vector of @|struct kwval| objects, each of which | |
381 | contains a keyword name, as a pointer to a null-terminated string, and the | |
382 | \emph{address} of a corresponding argument value. (This indirection is | |
383 | necessary so that the items in the vector can be of uniform size.) | |
384 | Argument vectors are rather inconvenient to use, but are the only practical | |
385 | way in which a caller can decide at runtime which arguments to include in a | |
386 | call, which is useful when writing wrapper functions. | |
387 | \end{enumerate} | |
388 | ||
389 | Keyword arguments are provided as a general feature for C functions. | |
43073476 MW |
390 | However, Sod has special support for messages which accept keyword arguments |
391 | (\xref{sec:concepts.methods.keywords}). | |
9e91c8e7 MW |
392 | |
393 | %%%-------------------------------------------------------------------------- | |
3cc520db MW |
394 | \section{Messages and methods} \label{sec:concepts.methods} |
395 | ||
396 | Objects can be sent \emph{messages}. A message has a \emph{name}, and | |
397 | carries a number of \emph{arguments}. When an object is sent a message, a | |
398 | function, determined by the receiving object's class, is invoked, passing it | |
399 | the receiver and the message arguments. This function is called the | |
400 | class's \emph{effective method} for the message. The effective method can do | |
401 | anything a C function can do, including reading or updating program state or | |
402 | object slots, sending more messages, calling other functions, issuing system | |
403 | calls, or performing I/O; if it finishes, it may return a value, which is | |
404 | returned in turn to the message sender. | |
405 | ||
406 | The set of messages an object can receive, characterized by their names, | |
407 | argument types, and return type, is determined by the object's class. Each | |
408 | class can define new messages, which can be received by any instance of that | |
409 | class. The messages defined by a single class must have distinct names: | |
410 | there is no `function overloading'. As with slots | |
411 | (\xref{sec:concepts.classes.slots}), messages defined by distinct classes are | |
412 | always distinct, even if they have the same names: references to messages are | |
413 | always qualified by the defining class's name or nickname. | |
414 | ||
415 | Messages may take any number of arguments, of any non-array value type. | |
416 | Since message sends are effectively function calls, arguments of array type | |
417 | are implicitly converted to values of the corresponding pointer type. While | |
418 | message definitions may ascribe an array type to an argument, the formal | |
419 | argument will have pointer type, as is usual for C functions. A message may | |
420 | accept a variable-length argument suffix, denoted @|\dots|. | |
421 | ||
422 | A class definition may include \emph{direct methods} for messages defined by | |
423 | it or any of its superclasses. | |
424 | ||
425 | Like messages, direct methods define argument lists and return types, but | |
426 | they may also have a \emph{body}, and a \emph{role}. | |
427 | ||
428 | A direct method need not have the same argument list or return type as its | |
429 | message. The acceptable argument lists and return types for a method depend | |
430 | on the message, in particular its method combination | |
431 | (\xref{sec:concepts.methods.combination}), and the method's role. | |
432 | ||
433 | A direct method body is a block of C code, and the Sod translator usually | |
434 | defines, for each direct method, a function with external linkage, whose body | |
435 | contains a copy of the direct method body. Within the body of a direct | |
436 | method defined for a class $C$, the variable @|me|, of type pointer to class | |
437 | type of $C$, refers to the receiving object. | |
438 | ||
0a2d4b68 | 439 | |
3cc520db MW |
440 | \subsection{Effective methods and method combinations} |
441 | \label{sec:concepts.methods.combination} | |
442 | ||
443 | For each message a direct instance of a class might receive, there is a set | |
444 | of \emph{applicable methods}, which are exactly the direct methods defined on | |
445 | the object's class and its superclasses. These direct methods are combined | |
446 | together to form the \emph{effective method} for that particular class and | |
447 | message. Direct methods can be combined into an effective method in | |
448 | different ways, according to the \emph{method combination} specified by the | |
449 | message. The method combination determines which direct method roles are | |
450 | acceptable, and, for each role, the appropriate argument lists and return | |
451 | types. | |
452 | ||
453 | One direct method, $M$, is said to be more (resp.\ less) \emph{specific} than | |
454 | another, $N$, with respect to a receiving class~$C$, if the class defining | |
455 | $M$ is a more (resp.\ less) specific superclass of~$C$ than the class | |
456 | defining $N$. | |
457 | ||
43073476 | 458 | \subsubsection{The standard method combination} |
3cc520db MW |
459 | The default method combination is called the \emph{standard method |
460 | combination}; other method combinations are useful occasionally for special | |
461 | effects. The standard method combination accepts four direct method roles, | |
9761db0d | 462 | called `primary' (the default), @|before|, @|after|, and @|around|. |
3cc520db MW |
463 | |
464 | All direct methods subject to the standard method combination must have | |
465 | argument lists which \emph{match} the message's argument list: | |
466 | \begin{itemize} | |
467 | \item the method's arguments must have the same types as the message, though | |
468 | the arguments may have different names; and | |
469 | \item if the message accepts a variable-length argument suffix then the | |
470 | direct method must instead have a final argument of type @|va_list|. | |
471 | \end{itemize} | |
b1254eb6 MW |
472 | Primary and @|around| methods must have the same return type as the message; |
473 | @|before| and @|after| methods must return @|void| regardless of the | |
474 | message's return type. | |
3cc520db MW |
475 | |
476 | If there are no applicable primary methods then no effective method is | |
477 | constructed: the vtables contain null pointers in place of pointers to method | |
478 | entry functions. | |
479 | ||
480 | The effective method for a message with standard method combination works as | |
481 | follows. | |
482 | \begin{enumerate} | |
483 | ||
484 | \item If any applicable methods have the @|around| role, then the most | |
485 | specific such method, with respect to the class of the receiving object, is | |
486 | invoked. | |
487 | ||
b1254eb6 | 488 | Within the body of an @|around| method, the variable @|next_method| is |
3cc520db MW |
489 | defined, having pointer-to-function type. The method may call this |
490 | function, as described below, any number of times. | |
491 | ||
b1254eb6 MW |
492 | If there any remaining @|around| methods, then @|next_method| invokes the |
493 | next most specific such method, returning whichever value that method | |
494 | returns; otherwise the behaviour of @|next_method| is to invoke the before | |
495 | methods (if any), followed by the most specific primary method, followed by | |
496 | the @|around| methods (if any), and to return whichever value was returned | |
781a8fbd MW |
497 | by the most specific primary method, as described in the following items. |
498 | That is, the behaviour of the least specific @|around| method's | |
499 | @|next_method| function is exactly the behaviour that the effective method | |
500 | would have if there were no @|around| methods. Note that if the | |
501 | least-specific @|around| method calls its @|next_method| more than once | |
502 | then the whole sequence of @|before|, primary, and @|after| methods occurs | |
503 | multiple times. | |
3cc520db | 504 | |
b1254eb6 MW |
505 | The value returned by the most specific @|around| method is the value |
506 | returned by the effective method. | |
3cc520db MW |
507 | |
508 | \item If any applicable methods have the @|before| role, then they are all | |
509 | invoked, starting with the most specific. | |
510 | ||
511 | \item The most specific applicable primary method is invoked. | |
512 | ||
513 | Within the body of a primary method, the variable @|next_method| is | |
514 | defined, having pointer-to-function type. If there are no remaining less | |
515 | specific primary methods, then @|next_method| is a null pointer. | |
516 | Otherwise, the method may call the @|next_method| function any number of | |
517 | times. | |
518 | ||
519 | The behaviour of the @|next_method| function, if it is not null, is to | |
520 | invoke the next most specific applicable primary method, and to return | |
521 | whichever value that method returns. | |
522 | ||
b1254eb6 MW |
523 | If there are no applicable @|around| methods, then the value returned by |
524 | the most specific primary method is the value returned by the effective | |
525 | method; otherwise the value returned by the most specific primary method is | |
526 | returned to the least specific @|around| method, which called it via its | |
527 | own @|next_method| function. | |
3cc520db MW |
528 | |
529 | \item If any applicable methods have the @|after| role, then they are all | |
530 | invoked, starting with the \emph{least} specific. (Hence, the most | |
b1254eb6 | 531 | specific @|after| method is invoked with the most `afterness'.) |
3cc520db MW |
532 | |
533 | \end{enumerate} | |
534 | ||
b1254eb6 MW |
535 | A typical use for @|around| methods is to allow a base class to set up the |
536 | dynamic environment appropriately for the primary methods of its subclasses, | |
537 | e.g., by claiming a lock, and restore it afterwards. | |
3cc520db | 538 | |
9761db0d | 539 | The @|next_method| function provided to methods with the primary and |
3cc520db MW |
540 | @|around| roles accepts the same arguments, and returns the same type, as the |
541 | message, except that one or two additional arguments are inserted at the | |
542 | front of the argument list. The first additional argument is always the | |
543 | receiving object, @|me|. If the message accepts a variable argument suffix, | |
544 | then the second addition argument is a @|va_list|; otherwise there is no | |
545 | second additional argument; otherwise, In the former case, a variable | |
546 | @|sod__master_ap| of type @|va_list| is defined, containing a separate copy | |
547 | of the argument pointer (so the method body can process the variable argument | |
548 | suffix itself, and still pass a fresh copy on to the next method). | |
549 | ||
9761db0d | 550 | A method with the primary or @|around| role may use the convenience macro |
3cc520db MW |
551 | @|CALL_NEXT_METHOD|, which takes no arguments itself, and simply calls |
552 | @|next_method| with appropriate arguments: the receiver @|me| pointer, the | |
553 | argument pointer @|sod__master_ap| (if applicable), and the method's | |
554 | arguments. If the method body has overwritten its formal arguments, then | |
555 | @|CALL_NEXT_METHOD| will pass along the updated values, rather than the | |
556 | original ones. | |
557 | ||
781a8fbd MW |
558 | A primary or @|around| method which invokes its @|next_method| function is |
559 | said to \emph{extend} the message behaviour; a method which does not invoke | |
560 | its @|next_method| is said to \emph{override} the behaviour. Note that a | |
561 | method may make a decision to override or extend at runtime. | |
562 | ||
43073476 | 563 | \subsubsection{Aggregating method combinations} |
3cc520db MW |
564 | A number of other method combinations are provided. They are called |
565 | `aggregating' method combinations because, instead of invoking just the most | |
566 | specific primary method, as the standard method combination does, they invoke | |
567 | the applicable primary methods in turn and aggregate the return values from | |
568 | each. | |
569 | ||
570 | The aggregating method combinations accept the same four roles as the | |
b1254eb6 MW |
571 | standard method combination, and @|around|, @|before|, and @|after| methods |
572 | work in the same way. | |
3cc520db MW |
573 | |
574 | The aggregating method combinations provided are as follows. | |
575 | \begin{description} \let\makelabel\code | |
576 | \item[progn] The message must return @|void|. The applicable primary methods | |
577 | are simply invoked in turn, most specific first. | |
578 | \item[sum] The message must return a numeric type.\footnote{% | |
579 | The Sod translator does not check this, since it doesn't have enough | |
580 | insight into @|typedef| names.} % | |
581 | The applicable primary methods are invoked in turn, and their return values | |
582 | added up. The final result is the sum of the individual values. | |
583 | \item[product] The message must return a numeric type. The applicable | |
584 | primary methods are invoked in turn, and their return values multiplied | |
585 | together. The final result is the product of the individual values. | |
586 | \item[min] The message must return a scalar type. The applicable primary | |
587 | methods are invoked in turn. The final result is the smallest of the | |
588 | individual values. | |
589 | \item[max] The message must return a scalar type. The applicable primary | |
590 | methods are invoked in turn. The final result is the largest of the | |
591 | individual values. | |
665a0455 MW |
592 | \item[and] The message must return a scalar type. The applicable primary |
593 | methods are invoked in turn. If any method returns zero then the final | |
594 | result is zero and no further methods are invoked. If all of the | |
595 | applicable primary methods return nonzero, then the final result is the | |
596 | result of the last primary method. | |
597 | \item[or] The message must return a scalar type. The applicable primary | |
598 | methods are invoked in turn. If any method returns nonzero then the final | |
599 | result is that nonzero value and no further methods are invoked. If all of | |
600 | the applicable primary methods return zero, then the final result is zero. | |
3cc520db MW |
601 | \end{description} |
602 | ||
603 | There is also a @|custom| aggregating method combination, which is described | |
604 | in \xref{sec:fixme.custom-aggregating-method-combination}. | |
605 | ||
43073476 MW |
606 | |
607 | \subsection{Messages with keyword arguments} | |
608 | \label{sec:concepts.methods.keywords} | |
609 | ||
610 | A message or a direct method may declare that it accepts keyword arguments. | |
611 | A message which accepts keyword arguments is called a \emph{keyword message}; | |
612 | a direct method which accepts keyword arguments is called a \emph{keyword | |
613 | method}. | |
614 | ||
615 | While method combinations may set their own rules, usually keyword methods | |
616 | can only be defined on keyword messages, and all methods defined on a keyword | |
617 | message must be keyword methods. The direct methods defined on a keyword | |
618 | message may differ in the keywords they accept, both from each other, and | |
619 | from the message. If two superclasses of some common class both define | |
620 | keyword methods on the same message, and the methods both accept a keyword | |
621 | argument with the same name, then these two keyword arguments must also have | |
622 | the same type. Different applicable methods may declare keyword arguments | |
623 | with the same name but different defaults; see below. | |
624 | ||
625 | The keyword arguments acceptable in a message sent to an object are the | |
626 | keywords listed in the message definition, together with all of the keywords | |
627 | accepted by any applicable method. There is no easy way to determine at | |
628 | runtime whether a particular keyword is acceptable in a message to a given | |
629 | instance. | |
630 | ||
631 | At runtime, a direct method which accepts one or more keyword arguments | |
632 | receives an additional argument named @|suppliedp|. This argument is a small | |
633 | structure. For each keyword argument named $k$ accepted by the direct | |
634 | method, @|suppliedp| contains a one-bit-wide bitfield member of type | |
635 | @|unsigned|, also named $k$. If a keyword argument named $k$ was passed in | |
636 | the message, then @|suppliedp.$k$| is one, and $k$ contains the argument | |
637 | value; otherwise @|suppliedp.$k$| is zero, and $k$ contains the default value | |
638 | from the direct method definition if there was one, or an unspecified value | |
639 | otherwise. | |
640 | ||
3cc520db | 641 | %%%-------------------------------------------------------------------------- |
d24d47f5 MW |
642 | \section{The object lifecycle} \label{sec:concepts.lifecycle} |
643 | ||
644 | \subsection{Creation} \label{sec:concepts.lifecycle.birth} | |
645 | ||
646 | Construction of a new instance of a class involves three steps. | |
647 | \begin{enumerate} | |
648 | \item \emph{Allocation} arranges for there to be storage space for the | |
649 | instance's slots and associated metadata. | |
650 | \item \emph{Imprinting} fills in the instance's metadata, associating the | |
651 | instance with its class. | |
652 | \item \emph{Initialization} stores appropriate initial values in the | |
653 | instance's slots, and maybe links it into any external data structures as | |
654 | necessary. | |
655 | \end{enumerate} | |
656 | The \descref{SOD_DECL}[macro]{mac} handles constructing instances with | |
657 | automatic storage duration (`on the stack'). Currently, there is no built-in | |
658 | support for constructing dynamically-allocated instances. | |
659 | ||
660 | \subsubsection{Allocation} | |
661 | Instances of most classes (specifically including those classes defined by | |
662 | Sod itself) can be held in any storage of sufficient size. The in-memory | |
663 | layout of an instance of some class~$C$ is described by the type @|struct | |
664 | $C$__ilayout|, and if the relevant class is known at compile time then the | |
665 | best way to discover the layout size is with the @|sizeof| operator. Failing | |
666 | that, the size required to hold an instance of $C$ is available in a slot in | |
667 | $C$'s class object, as @|$C$__class@->cls.initsz|. | |
668 | ||
669 | It is not in general sufficient to declare, or otherwise allocate, an object | |
670 | of the class type $C$. The class type only describes a single chain of the | |
671 | object's layout. It is nearly always an error to use the class type as if it | |
672 | is a \emph{complete type}, e.g., to declare objects or arrays of the class | |
673 | type, or to enquire about its size or alignment requirements. | |
674 | ||
675 | Instance layouts may be declared as objects with automatic storage duration | |
676 | (colloquially, `allocated on the stack') or allocated dynamically, e.g., | |
677 | using @|malloc|. They may be included as members of structures or unions, or | |
678 | elements of arrays. Sod's runtime system doesn't retain addresses of | |
679 | instances, so, for example, Sod doesn't make using fancy allocators which | |
680 | sometimes move objects around in memory any more difficult than it needs to | |
681 | be. | |
682 | ||
683 | There isn't any way to discover the alignment required for a particular | |
684 | class's instances at runtime; it's best to be conservative and assume that | |
685 | the platform's strictest alignment requirement applies. | |
686 | ||
687 | The following simple function correctly allocates and returns space for an | |
688 | instance of a class given a pointer to its class object @<cls>. | |
689 | \begin{prog} | |
690 | void *allocate_instance(const SodClass *cls) \\ \ind | |
691 | \{ return malloc(cls@->cls.initsz); \} | |
692 | \end{prog} | |
693 | ||
694 | \subsubsection{Imprinting} | |
695 | Once storage has been allocated, it must be \emph{imprinted} before it can be | |
696 | used as an instance of a class, e.g., before any messages can be sent to it. | |
697 | ||
698 | Imprinting an instance stores some metadata about its direct class in the | |
699 | instance structure, so that the rest of the program (and Sod's runtime | |
700 | library) can tell what sort of object it is, and how to use it.\footnote{% | |
701 | Specifically, imprinting an instance's storage involves storing the | |
702 | appropriate vtable pointers in the right places in it.} % | |
703 | A class object's @|imprint| slot points to a function which will correctly | |
704 | imprint storage for one of that class's instances. | |
705 | ||
706 | Once an instance's storage has been imprinted, it is technically possible to | |
707 | send messages to the instance; however the instance's slots are still | |
708 | uninitialized at this point, the applicable methods are unlikely to do much | |
709 | of any use unless they've been written specifically for the purpose. | |
710 | ||
711 | The following simple function imprints storage at address @<p> as an instance | |
712 | of a class, given a pointer to its class object @<cls>. | |
713 | \begin{prog} | |
714 | void imprint_instance(const SodClass *cls, void *p) \\ \ind | |
715 | \{ cls@->cls.imprint(p); \} | |
716 | \end{prog} | |
717 | ||
718 | \subsubsection{Initialization} | |
719 | The final step for constructing a new instance is to \emph{initialize} it, to | |
720 | establish the necessary invariants for the instance itself and the | |
721 | environment in which it operates. | |
722 | ||
723 | Details of initialization are necessarily class-specific, but typically it | |
724 | involves setting the instance's slots to appropriate values, and possibly | |
725 | linking it into some larger data structure to keep track of it. | |
726 | ||
727 | Classes can declare initial values for their slots. A class object's @|init| | |
728 | slot points to a function which will establish the appropriate initial values | |
729 | for a new instance's slots. Slots are not initialized in any particularly | |
54fa3df9 | 730 | useful order. |
d24d47f5 MW |
731 | |
732 | The provided initialization protocol is extremely simplistic; most notably, | |
733 | it's not possible to pass parameters into the initialization process. | |
734 | Classes which have more complex requirements will need to define and | |
735 | implement their own additional (or alternative) protocols. | |
736 | ||
737 | \subsubsection{Example} | |
738 | The following is a simple function, with syntactic-sugar macro, which | |
739 | allocate storage for an instance of a class, imprints and initializes it, and | |
740 | returns a pointer to the new instance. | |
741 | \begin{prog} | |
742 | void *make_instance(const SodClass *c) \\ | |
743 | \{ \\ \ind | |
744 | void *p = malloc(c@->cls.initsz); \\ | |
745 | if (!p) return (0); \\ | |
54fa3df9 | 746 | c@->cls.imprint(p); \\ |
d24d47f5 MW |
747 | c@->cls.init(p); \\ |
748 | return (p); \- \\ | |
749 | \} | |
750 | \\+ | |
751 | \#define MAKE(cls) (cls *)make_instance(cls\#\#__class) | |
752 | \end{prog} | |
753 | ||
754 | ||
755 | \subsection{Destruction} | |
756 | \label{sec:concepts.lifecycle.death} | |
757 | ||
758 | Destruction of an instance, when it is no longer required, consists of two | |
759 | steps. | |
760 | \begin{enumerate} | |
761 | \item \emph{Teardown} releases any resources held by the instance and | |
762 | disentangles it from any external data structures. | |
763 | \item \emph{Deallocation} releases the memory used to store the instance so | |
764 | that it can be reused. | |
765 | \end{enumerate} | |
766 | ||
767 | \subsubsection{Teardown} | |
768 | Details of teardown are class-specific, but typically it involves releasing | |
769 | resources held by the instance, and possibly unlinking it from some larger | |
770 | data structure which used to keep track of it. | |
771 | ||
772 | There is no provided protocol for teardown: classes whose instances require | |
773 | teardown behaviour must define and implement an appropriate protocol of their | |
774 | own. The following class may serve for simple cases. | |
775 | \begin{prog} | |
776 | [nick = disposable] \\ | |
777 | class DisposableObject : SodObject \{ \\- \ind | |
778 | void release() \{ ; \} \\ | |
779 | \quad /* Release resources held by the receiver. */ \- \\- | |
780 | \} | |
781 | \\+ | |
782 | code c : user \{ \\- \ind | |
783 | /* If p is a a DisposableObject then release its resources. */ \\ | |
784 | void maybe_dispose(void *p) \\ | |
785 | \{ \\ \ind | |
786 | DisposableObject *d = SOD_CONVERT(DisposableObject, p); \\ | |
787 | if (d) DisposableObject_release(d); \- \\ | |
788 | \} \- \\ | |
789 | \} | |
790 | \end{prog} | |
791 | ||
792 | \subsubsection{Deallocation} | |
793 | The details of instance deallocation are obviously specific to the allocation | |
794 | strategy used by the instance, and this is often orthogonal from the object's | |
795 | class. | |
796 | ||
797 | The code which makes the decision to destroy an object may often not be aware | |
798 | of the object's direct class. Low-level details of deallocation often | |
799 | require the proper base address of the instance's storage, which can be | |
800 | determined using the \descref{SOD_INSTBASE}[macro]{mac}. | |
801 | ||
802 | \subsubsection{Example} | |
803 | The following is a counterpart to the @|new_instance| function | |
804 | (\xref{sec:concepts.lifecycle.birth}), which tears down and deallocates an | |
805 | instance allocated using @|malloc|. | |
806 | \begin{prog} | |
807 | void free_instance(void *p) \\ | |
808 | \{ \\ \ind | |
809 | SodObject *obj = p; \\ | |
810 | maybe_dispose(p); \\ | |
811 | free(SOD_INSTBASE(obj)); \- \\ | |
812 | \} | |
813 | \end{prog} | |
814 | ||
815 | %%%-------------------------------------------------------------------------- | |
3cc520db | 816 | \section{Metaclasses} \label{sec:concepts.metaclasses} |
1f7d590d MW |
817 | |
818 | %%%----- That's all, folks -------------------------------------------------- | |
819 | ||
820 | %%% Local variables: | |
821 | %%% mode: LaTeX | |
822 | %%% TeX-master: "sod.tex" | |
823 | %%% TeX-PDF-mode: t | |
824 | %%% End: |