\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[palatino, helvetica, courier, maths=cmr]{mdwfonts}
+\usepackage{tikz}
\usepackage{syntax}
\usepackage{sverb}
\usepackage{mdwtab}
+\usepackage[mdwmargin]{mdwthm}
+\usepackage{amssymb}
\usepackage{footnote}
\usepackage{at}
\usepackage{mdwref}
\def\m@maybe{\ifmmode\else$\let\m@maybe@end$\fi}
\def\@scripts{\futurelet\@ch\@scripts@i}
+\def\chain#1#2{\mathsf{ch}_{#1}(#2)}
+\def\chainhead#1#2{\mathsf{hd}_{#1}(#2)}
+\def\chaintail#1#2{\mathsf{tl}_{#1}(#2)}
+
+\let\implies\Rightarrow
+
\atdef ;#1\\{\normalfont\itshape;#1\\}
\let\@@grammar\grammar
\def\grammar{\def\textbar{\hbox{$|$}}\@@grammar}
\label{sec:syntax}
Fortunately, Sod is syntactically quite simple. I've used a little slightly
-unusual notation in order to make the presentation easier to read.
+unusual notation in order to make the presentation easier to read. For any
+nonterminal $x$:
\begin{itemize}
\item $\epsilon$ denotes the empty nonterminal:
\begin{quote}
$\epsilon$ ::=
\end{quote}
-\item @[@<item>@] means an optional @<item>:
+\item @[$x$@] means an optional $x$:
\begin{quote}
- \syntax{@[<item>@] ::= $\epsilon$ @! <item>}
+ \syntax{@[$x$@] ::= $\epsilon$ @! $x$}
\end{quote}
-\item @<item>^* means a sequence of zero or more @<item>s:
+\item $x^*$ means a sequence of zero or more $x$s:
\begin{quote}
- \syntax{@<item>^* ::= $\epsilon$ @! @<item>^* <item>}
+ \syntax{$x^*$ ::= $\epsilon$ @! $x^*$ $x$}
\end{quote}
-\item @<item>^+ means a sequence of one or more @<item>s:
+\item $x^+$ means a sequence of one or more $x$s:
\begin{quote}
- \syntax{@<item>^+ ::= <item> @<item>^*}
+ \syntax{$x^+$ ::= $x$ $x^*$}
\end{quote}
-\item @<item-list> means a sequence of one or more @<item>s separated
+\item $x$@<-list> means a sequence of one or more $x$s separated
by commas:
\begin{quote}
- \syntax{<item-list> ::= <item> @! <item-list> "," <item>}
+ \syntax{$x$<-list> ::= $x$ @! $x$<-list> "," $x$}
\end{quote}
\end{itemize}
\begin{grammar}
<token> ::= <identifier>
-\alt <reserved-word>
\alt <string-literal>
\alt <char-literal>
\alt <integer-literal>
\alt <punctuation>
\end{grammar}
-This syntax is slightly ambiguous. The following two rules serve to
-disambiguate:
-\begin{enumerate}
-\item Reserved words take precedence. All @<reserved-word>s are
- syntactically @<identifier>s; Sod resolves the ambiguity in favour of
- @<reserved-word>.
-\item `Maximal munch'. In other cases, at each stage we take the longest
- sequence of characters which could be a token.
-\end{enumerate}
+This syntax is slightly ambiguous, and is disambiguated by the \emph{maximal
+munch} rule: at each stage we take the longest sequence of characters which
+could be a token.
\subsubsection{Identifiers} \label{sec:syntax.lex.id}
\textsf{alpha-char-p} in the hosting Lisp system. For portability,
programmers are encouraged to limit themselves to the standard ASCII letters.
-\subsubsection{Reserved words} \label{sec:syntax.lex.reserved}
-
-\begin{grammar}
-<reserved-word> ::=
-"char" | "class" | "code" | "const" | "double" | "enum" |
-"extern" | "float" | "import" | "int" | "lisp" | "load" | "long"
-| "restrict" | "short" | "signed" | "struct" | "typename" |
-"union" | "unsigned" | "void" | "volatile"
-\end{grammar}
-
-Many of these are borrowed from~C; however, some (e.g., @"import" and
-@"lisp") are not, and some C reserved words are not reserved (e.g.,
-@"static").
+There are no reserved words at the lexical level, but the higher-level syntax
+recognizes certain identifiers as \emph{keywords} in some contexts. There is
+also an ambiguity (inherited from C) in the declaration syntax which is
+settled by distinguishing type names from other identifiers at a lexical
+level.
\subsubsection{String and character literals} \label{sec:syntax.lex.string}
\subsubsection{Punctuation} \label{sec:syntax.lex.punct}
\begin{grammar}
-<punctuation> ::= any character other than "\"" or "'"
+<punctuation> ::= any nonalphanumeric character other than "_", "\"" or "'"
\end{grammar}
-Due to the `maximal munch' rule, @<punctuation> tokens cannot be
-alphanumeric.
-
\subsubsection{Comments} \label{sec:lex-comment}
\begin{grammar}
the Sod translator is to provide the extension as a standard ASDF system (or
similar) and leave a dropping @"foo-extension.lisp" in the module path saying
something like
-\begin{listing}
-(asdf:operate 'asdf:load-op :foo-extension)
-\end{listing}
+\begin{quote}
+ \textsf{(asdf:load-system :foo-extension)}
+\end{quote}
which will arrange for the extension to be compiled if necessary.
(This approach means that the language doesn't need to depend on any
%%%--------------------------------------------------------------------------
\section{Classes}
+\label{sec:class}
-\subsection{Classes and superclasses}
+\subsection{Classes and superclasses} \label{sec:class.defs}
A @<full-class-definition> must list one or more existing classes to be the
\emph{direct superclasses} for the new class being defined. We make the
The predefined class @|SodObject| has no direct superclasses; it is unique in
this respect. All classes are subclasses of @|SodObject|.
-\subsection{The class precedence list}
+\subsection{The class precedence list} \label{sec:class.cpl}
Let $C$ be a class. The superclasses of $C$ form a directed graph, with an
edge from each class to each of its direct superclasses. This is the
$X$ is earlier than $Y$ in $C$'s class precedence list. If $C$ is clear from
context then we omit it, saying simply that $X$ is more specific than $Y$.
-\subsection{Instances and metaclasses}
+\subsection{Instances and metaclasses} \label{sec:class.meta}
A class defines the structure and behaviour of its \emph{instances}: run-time
objects created (possibly) dynamically. An instance is an instance of only
@|SodClass| has @|SodObject| as its only direct superclass. @|SodClass| is
its own metaclass.
-\subsection{Items and inheritance}
+To make matters more complicated, Sod has \emph{two} distinct metalevels: as
+well as the runtime metalevel, as discussed above, there's a compile-time
+metalevel hosted in the Sod translator. Since Sod is written in Common Lisp,
+a Sod class's compile-time metaclass is a CLOS class. The usual compile-time
+metaclass is @|sod-class|. The compile-time metalevel is the subject of
+\xref{ch:api}.
+
+\subsection{Items and inheritance} \label{sec:class.inherit}
A class definition also declares \emph{slots}, \emph{messages},
\emph{initializers} and \emph{methods} -- collectively referred to as
superclasses then items defined in that superclass may appear duplicated in
the subclass. Sod does not have this notion.
-\subsubsection{Slots}
+\subsubsection{Slots} \label{sec:class.inherit.slots}
A \emph{slot} is a unit of state. In other object systems, slots may be
called `fields', `member variables', or `instance variables'.
whether direct or inherited. Changing the value of an instance's slot
doesn't affect other instances.
-\subsubsection{Initializers}
+\subsubsection{Initializers} \label{sec:class.inherit.init}
Mumble.
-\subsubsection{Messages}
+\subsubsection{Messages} \label{sec:class.inherit.messages}
A \emph{message} is the stimulus for behaviour. In Sod, a class must define,
statically, the name and format of the messages it is able to receive and the
values it will return in reply. In this respect, a message is similar to
At run-time, one sends a message to an instance by invoking a function
obtained from the instance's \emph{vtable}: \xref{sec:fixme-vtable}.
-\subsubsection{Methods}
+\subsubsection{Methods} \label{sec:class.inherit.methods}
A \emph{method} is a unit of behaviour. In other object systems, methods may
be called `member functions'.
the applicable methods are invoked are described fully in
\xref{sec:fixme-method-combination}.
-\subsection{Chains and instance layout}
+\subsection{Chains and instance layout} \label{sec:class.layout}
+
+C is a rather low-level language, and in particular it exposes details of the
+way data is laid out in memory. Since an instance of a class~$C$ should be
+(at least in principle) usable anywhere an instance of some superclass $B
+\succeq C$ is expected, this implies that an instance of the subclass $C$
+needs to contain within it a complete instance of each superclass $B$, laid
+out according to the rules of instances of $B$, so that if we have (the
+address of) an instance of $C$, we can easily construct a pointer to a thing
+which looks like an instance of $B$ contained within it.
+
+Specifically, the information we need to retain for an instance of a
+class~$C$ is:
+\begin{itemize}
+\item the values of each of the slots defined by $C$, including those defined
+ by superclasses;
+\item information which will let us convert a pointer to $C$ into a pointer
+ to any superclass $B \succeq C$;
+\item information which will let us call the appropriate effective method for
+ each message defined by $C$, including those defined by superclasses; and
+\item some additional meta-level information, such as how to find the class
+ object for $C$ given (the address of) one of its instances.
+\end{itemize}
+
+Observe that, while each distinct instance must clearly have its own storage
+for slots, all instances of $C$ can share a single copy of the remaining
+information. The individual instance only needs to keep a pointer to this
+shared table, which, inspired by the similar structure in many \Cplusplus\
+ABIs, are called a \emph{vtable}.
+
+The easiest approach would be to decide that instances of $C$ are exactly
+like instances of $B$, only with extra space at the end for the extra slots
+which $C$ defines over and above those already existing in $B$. Conversion
+is then trivial: a pointer to an instance of $C$ can be converted to a
+pointer to an instance of some superclass $B$ simply by casting. Even though
+the root class @|SodObject| doesn't have any slots at all, its instances will
+still need a vtable so that you can find its class object: the address of the
+vtable therefore needs to be at the very start of the instance structure.
+Again, a vtable for a superclass would have a vtable for each of its
+superclasses as a prefix, with new items added afterwards.
+
+This appealing approach works well for an object system which only permits
+single inheritance of both state and behaviour. Alas, it breaks down when
+multiple inheritance is allowed: $C$ can be a subclass of both $B$ and $B'$,
+even though $B$ is not a subclass of $B'$, nor \emph{vice versa}; so, in
+general, $B$'s instance structure will not be a prefix of $B'$'s, nor will
+$B'$'s be a prefix of $B$'s, and therefore $C$ cannot have both $B$ and $B'$
+as a prefix.
+
+A (non-root) class may -- though need not -- have a distinguished \emph{link}
+superclass, which need not be a direct superclass. Furthermore, each
+class~$C$ must satisfy the \emph{chain condition}: for any superclass $A$ of
+$C$, there can be at most one other superclass of $C$ whose link superclass
+is $A$.\footnote{%
+ That is, it's permitted for two classes $B$ and $B'$ to have the same link
+ superclass $A$, but $B$ and $B'$ can't then both be superclasses of the
+ same class $C$.} %
+Therefore, the links partition the superclasses of~$C$ into nice linear
+\emph{chains}, such that each superclass is a member of exactly one chain.
+If a class~$B$ has a link superclass~$A$, then $B$'s \emph{level} is one more
+than that of $A$; otherwise $B$ is called a \emph{chain head} and its level
+is zero. If the classes in a chain are written in a list, chain head first,
+then the level of each class gives its index in the list.
+
+Chains therefore allow us to recover some of the linearity properties which
+made layout simple in the case of single inheritance. The instance structure
+for a class $C$ contains a substructure for each of $C$'s superclass chains;
+a pointer to an object of class $C$ actually points to the substructure for
+the chain containing $C$. The order of these substructures is unimportant
+for now.\footnote{%
+ The chains appear in the order in which their most specific classes appear
+ in $C$'s class precedence list. This guarantees that the chain containing
+ $C$ itself appears first, so that a pointer to $C$'s instance structure is
+ actually a pointer to $C$'s chain substructure. Apart from that, it's a
+ simple, stable, but basically arbitrary choice which can't be changed
+ without breaking the ABI.} %
+The substructure for each chain begins with a pointer to a vtable, followed
+by a structure for each superclass in the chain containing the slots defined
+by that superclass, with the chain head (least specific class) first.
+
+Suppose we have a pointer to (static) type $C$, and want to convert it into a
+pointer to some superclass $B$ of $C$ -- an \emph{upcast}.\footnote{%
+ In the more general case, we have a pointer to static type $C$, which
+ actually points to an object of some subclass $D$ of $C$, and want to
+ convert it into a pointer to type $B$. Such a conversion is called a
+ \emph{downcast} if $B$ is a subclass of $C$, or a \emph{cross-cast}
+ otherwise. Downcasts and cross-casts require complicated run-time
+ checking, and can will fail unless $B$ is a superclass of $D$.} %
+If $B$ is in the same chain as $C$ -- an \emph{in-chain upcast} -- then the
+pointer value is already correct and it's only necessary to cast it
+appropriately. Otherwise -- a \emph{cross-chain upcast} -- the pointer needs
+to be adjusted to point to a different chain substructure. Since the lengths
+and relative positions of the chain substructures vary between classes, the
+adjustments are stored in the vtable. Cross-chain upcasts are therefore a
+bit slower than in-chain upcasts.
+
+Each chain has its own separate vtable, because much of the metadata stored
+in the vtable is specific to a particular chain. For example:
+\begin{itemize}
+\item offsets to other chains' substructures will vary depending on which
+ chain we start from; and
+\item entry points to methods {
+
+%%%--------------------------------------------------------------------------
+\chapter{The Lisp programming interface} \label{ch:api}
+
+%% output for `h' files
+%%
+%% prologue
+%% guard start
+%% typedefs start
+%% typedefs
+%% typedefs end
+%% includes start
+%% includes
+%% includes end
+%% classes start
+%% CLASS banner
+%% CLASS islots start
+%% CLASS islots slots
+%% CLASS islots end
+%% CLASS vtmsgs start
+%% CLASS vtmsgs CLASS start
+%% CLASS vtmsgs CLASS slots
+%% CLASS vtmsgs CLASS end
+%% CLASS vtmsgs end
+%% CLASS vtables start
+%% CLASS vtables CHAIN-HEAD start
+%% CLASS vtables CHAIN-HEAD slots
+%% CLASS vtables CHAIN-HEAD end
+%% CLASS vtables end
+%% CLASS vtable-externs
+%% CLASS vtable-externs-after
+%% CLASS methods start
+%% CLASS methods
+%% CLASS methods end
+%% CLASS ichains start
+%% CLASS ichains CHAIN-HEAD start
+%% CLASS ichains CHAIN-HEAD slots
+%% CLASS ichains CHAIN-HEAD end
+%% CLASS ichains end
+%% CLASS ilayout start
+%% CLASS ilayout slots
+%% CLASS ilayout end
+%% CLASS conversions
+%% CLASS object
+%% classes end
+%% guard end
+%% epilogue
+
+%% output for `c' files
+%%
+%% prologue
+%% includes start
+%% includes
+%% includes end
+%% classes start
+%% CLASS banner
+%% CLASS direct-methods start
+%% CLASS direct-methods METHOD start
+%% CLASS direct-methods METHOD body
+%% CLASS direct-methods METHOD end
+%% CLASS direct-methods end
+%% CLASS effective-methods
+%% CLASS vtables start
+%% CLASS vtables CHAIN-HEAD start
+%% CLASS vtables CHAIN-HEAD class-pointer METACLASS
+%% CLASS vtables CHAIN-HEAD base-offset
+%% CLASS vtables CHAIN-HEAD chain-offset TARGET-HEAD
+%% CLASS vtables CHAIN-HEAD vtmsgs CLASS start
+%% CLASS vtables CHAIN-HEAD vtmsgs CLASS slots
+%% CLASS vtables CHAIN-HEAD vtmsgs CLASS end
+%% CLASS vtables CHAIN-HEAD end
+%% CLASS vtables end
+%% CLASS object prepare
+%% CLASS object start
+%% CLASS object CHAIN-HEAD ichain start
+%% CLASS object SUPER slots start
+%% CLASS object SUPER slots
+%% CLASS object SUPER vtable
+%% CLASS object SUPER slots end
+%% CLASS object CHAIN-HEAD ichain end
+%% CLASS object end
+%% classes end
+%% epilogue
+
+%%%--------------------------------------------------------------------------
\include{sod-backg}
\include{sod-protocol}