From: Mark Wooding <mdw@distorted.org.uk>
Date: Mon, 14 Sep 2015 21:34:48 +0000 (+0100)
Subject: Merge branch 'master' into doc
X-Git-Url: https://git.distorted.org.uk/~mdw/sod/commitdiff_plain/3dca7758421664a838c54b273bd9221f02072045?hp=58a9509014d2d133b91e96c7956957f82bc000a4

Merge branch 'master' into doc

* master: (93 commits)
  Eliminate the separately maintained Lisp system version number.
  src/{builtin,final,frontent}.lisp: `clear-the-decks' makes builtin module.
  src/class-{finalize,layout}-impl.lisp: Error checking on layout slots.
  src/class-finalize-impl.lisp: Remove FIXME which was fixed ages ago.
  src/: Introduce a macro for defining on-demand slots.
  Major effort to plug slot-name leaks.
  doc/list-exports.lisp: Report on generic function methods.
  doc/list-exports.lisp: Strip duplicate exports.
  doc/list-exports.lisp: Better pretty formatting for keywords.
  doc/list-exports.lisp: Mark the start of the class tree dump.
  src/codegen-proto.lisp, doc/list-exports.lisp: Export `inst' readers.
  src/: More missing exports.
  doc/list-exports.lisp: Sort sibling classes by name in the tree.
  doc/list-exports.lisp: Search for exports inside `eval-when' blocks.
  doc/list-exports.lisp: Don't get confused and thing `nil' isn't interned.
  doc/list-exports.lisp: Check for anomalies when preparing reports.
  doc/list-exports.lisp: Some sketchy code to report on exported symbols.
  src/: Fix up some wrong exports.
  src/final.lisp, src/frontend.lisp: Compile methods before dumping.
  src/frontend.lisp: Prepare the builtin module at load time.
  ...
---

diff --git a/doc/sod-backg.tex b/doc/sod-backg.tex
index 0f19c7d..0cb3877 100644
--- a/doc/sod-backg.tex
+++ b/doc/sod-backg.tex
@@ -46,6 +46,8 @@ This is the `is-subclass-of' relation we've been using so far.\footnote{%
   to be a partial order because of cycles in the class graph.  I haven't
   given a great deal of thought to how well Sod would cope with a cyclic
   class graph.} %
+We write $d \succeq c$ and say that $d$ is a superclass of $c$ if and only if
+$c \preceq d$.
 
 The problem comes when we try to resolve inheritance questions.  A class
 should inherit behaviour from its superclasses; but, in a world of multiple
diff --git a/doc/sod-protocol.tex b/doc/sod-protocol.tex
index f0bd115..6ef829b 100644
--- a/doc/sod-protocol.tex
+++ b/doc/sod-protocol.tex
@@ -117,11 +117,11 @@ category (function, class, macro, etc.) on the right.
   The synopsis for a function, generic function or method describes the
   function's lambda-list using the usual syntax.  Note that keyword arguments
   are shown by naming their keywords; in the description, the value passed
-  for the keyword argument @|keyword| is shown as @<keyword>.
+  for the keyword argument @|:keyword| is shown as @<keyword>.
 
   For a method, specializers are shown using the usual @|defmethod| syntax,
   e.g.,
-  \begin{quote}
+  \begin{quote} \sffamily
     some-generic-function ((@<specialized> list) @<unspecialized>)
   \end{quote}
 \end{describe}
@@ -153,7 +153,6 @@ category (function, class, macro, etc.) on the right.
   for @|let|.
 \end{describe}
 
-
 \begin{describe}{cls}{example-class (direct-super other-direct-super) \&key
     :initarg}
   The synopsis for a class lists the class's direct superclasses, and the
diff --git a/doc/sod-tut.tex b/doc/sod-tut.tex
index cff9859..ca686aa 100644
--- a/doc/sod-tut.tex
+++ b/doc/sod-tut.tex
@@ -100,8 +100,8 @@ complicated.  If you use \man{make}{1}, then something like
   SOD = sod
 
   .SUFFIXES: .sod .c .h
-  .sod.c:; \$(SOD) -gc -o \$@@ \$<
-  .sod.h:; \$(SOD) -gh -o \$@@ \$< %
+  .sod.c:; \$(SOD) -tc \$<
+  .sod.h:; \$(SOD) -th \$<
 \end{prog}
 ought to do the job.
 
diff --git a/doc/sod.tex b/doc/sod.tex
index 8a46735..ff0dea4 100644
--- a/doc/sod.tex
+++ b/doc/sod.tex
@@ -3,9 +3,12 @@
 \usepackage[T1]{fontenc}
 \usepackage[utf8]{inputenc}
 \usepackage[palatino, helvetica, courier, maths=cmr]{mdwfonts}
+\usepackage{tikz}
 \usepackage{syntax}
 \usepackage{sverb}
 \usepackage{mdwtab}
+\usepackage[mdwmargin]{mdwthm}
+\usepackage{amssymb}
 \usepackage{footnote}
 \usepackage{at}
 \usepackage{mdwref}
@@ -49,6 +52,12 @@
 \def\m@maybe{\ifmmode\else$\let\m@maybe@end$\fi}
 \def\@scripts{\futurelet\@ch\@scripts@i}
 
+\def\chain#1#2{\mathsf{ch}_{#1}(#2)}
+\def\chainhead#1#2{\mathsf{hd}_{#1}(#2)}
+\def\chaintail#1#2{\mathsf{tl}_{#1}(#2)}
+
+\let\implies\Rightarrow
+
 \atdef ;#1\\{\normalfont\itshape;#1\\}
 \let\@@grammar\grammar
 \def\grammar{\def\textbar{\hbox{$|$}}\@@grammar}
@@ -161,28 +170,29 @@ The following notation is used in this section.
 \label{sec:syntax}
 
 Fortunately, Sod is syntactically quite simple.  I've used a little slightly
-unusual notation in order to make the presentation easier to read.
+unusual notation in order to make the presentation easier to read.  For any
+nonterminal $x$:
 \begin{itemize}
 \item $\epsilon$ denotes the empty nonterminal:
   \begin{quote}
     $\epsilon$ ::=
   \end{quote}
-\item @[@<item>@] means an optional @<item>:
+\item @[$x$@] means an optional $x$:
   \begin{quote}
-    \syntax{@[<item>@] ::= $\epsilon$ @! <item>}
+    \syntax{@[$x$@] ::= $\epsilon$ @! $x$}
   \end{quote}
-\item @<item>^* means a sequence of zero or more @<item>s:
+\item $x^*$ means a sequence of zero or more $x$s:
   \begin{quote}
-    \syntax{@<item>^* ::= $\epsilon$ @! @<item>^* <item>}
+    \syntax{$x^*$ ::= $\epsilon$ @! $x^*$ $x$}
   \end{quote}
-\item @<item>^+ means a sequence of one or more @<item>s:
+\item $x^+$ means a sequence of one or more $x$s:
   \begin{quote}
-    \syntax{@<item>^+ ::= <item> @<item>^*}
+    \syntax{$x^+$ ::= $x$ $x^*$}
   \end{quote}
-\item @<item-list> means a sequence of one or more @<item>s separated
+\item $x$@<-list> means a sequence of one or more $x$s separated
   by commas:
   \begin{quote}
-    \syntax{<item-list> ::= <item> @! <item-list> "," <item>}
+    \syntax{$x$<-list> ::= $x$ @! $x$<-list> "," $x$}
   \end{quote}
 \end{itemize}
 
@@ -194,22 +204,15 @@ collected into tokens according to the following syntax.
 
 \begin{grammar}
 <token> ::= <identifier>
-\alt <reserved-word>
 \alt <string-literal>
 \alt <char-literal>
 \alt <integer-literal>
 \alt <punctuation>
 \end{grammar}
 
-This syntax is slightly ambiguous.  The following two rules serve to
-disambiguate:
-\begin{enumerate}
-\item Reserved words take precedence.  All @<reserved-word>s are
-  syntactically @<identifier>s; Sod resolves the ambiguity in favour of
-  @<reserved-word>.
-\item `Maximal munch'.  In other cases, at each stage we take the longest
-  sequence of characters which could be a token.
-\end{enumerate}
+This syntax is slightly ambiguous, and is disambiguated by the \emph{maximal
+munch} rule: at each stage we take the longest sequence of characters which
+could be a token.
 
 \subsubsection{Identifiers} \label{sec:syntax.lex.id}
 
@@ -233,19 +236,11 @@ The precise definition of @<alpha-char> is left to the function
 \textsf{alpha-char-p} in the hosting Lisp system.  For portability,
 programmers are encouraged to limit themselves to the standard ASCII letters.
 
-\subsubsection{Reserved words} \label{sec:syntax.lex.reserved}
-
-\begin{grammar}
-<reserved-word> ::=
-"char" | "class" | "code" | "const" | "double" | "enum" |
-"extern" | "float" | "import" | "int" | "lisp" | "load" | "long"
-| "restrict" | "short" | "signed" | "struct" | "typename" |
-"union" | "unsigned" | "void" | "volatile"
-\end{grammar}
-
-Many of these are borrowed from~C; however, some (e.g., @"import" and
-@"lisp") are not, and some C reserved words are not reserved (e.g.,
-@"static").
+There are no reserved words at the lexical level, but the higher-level syntax
+recognizes certain identifiers as \emph{keywords} in some contexts.  There is
+also an ambiguity (inherited from C) in the declaration syntax which is
+settled by distinguishing type names from other identifiers at a lexical
+level.
 
 \subsubsection{String and character literals} \label{sec:syntax.lex.string}
 
@@ -302,12 +297,9 @@ binary.  However, length and signedness indicators are not permitted.
 \subsubsection{Punctuation} \label{sec:syntax.lex.punct}
 
 \begin{grammar}
-<punctuation> ::= any character other than "\"" or "'"
+<punctuation> ::= any nonalphanumeric character other than "_", "\"" or "'"
 \end{grammar}
 
-Due to the `maximal munch' rule, @<punctuation> tokens cannot be
-alphanumeric.
-
 \subsubsection{Comments} \label{sec:lex-comment}
 
 \begin{grammar}
@@ -444,9 +436,9 @@ existing compiled files.  The right way to package a substantial extension to
 the Sod translator is to provide the extension as a standard ASDF system (or
 similar) and leave a dropping @"foo-extension.lisp" in the module path saying
 something like
-\begin{listing}
-(asdf:operate 'asdf:load-op :foo-extension)
-\end{listing}
+\begin{quote}
+  \textsf{(asdf:load-system :foo-extension)}
+\end{quote}
 which will arrange for the extension to be compiled if necessary.
 
 (This approach means that the language doesn't need to depend on any
@@ -888,8 +880,9 @@ void *sod_convert(const SodClass *cls, const void *obj)
 
 %%%--------------------------------------------------------------------------
 \section{Classes}
+\label{sec:class}
 
-\subsection{Classes and superclasses}
+\subsection{Classes and superclasses} \label{sec:class.defs}
 
 A @<full-class-definition> must list one or more existing classes to be the
 \emph{direct superclasses} for the new class being defined.  We make the
@@ -905,7 +898,7 @@ following definitions.
 The predefined class @|SodObject| has no direct superclasses; it is unique in
 this respect.  All classes are subclasses of @|SodObject|.
 
-\subsection{The class precedence list}
+\subsection{The class precedence list} \label{sec:class.cpl}
 
 Let $C$ be a class.  The superclasses of $C$ form a directed graph, with an
 edge from each class to each of its direct superclasses.  This is the
@@ -934,7 +927,7 @@ We say that \emph{$X$ is more specific than $Y$ as a superclass of $C$} if
 $X$ is earlier than $Y$ in $C$'s class precedence list.  If $C$ is clear from
 context then we omit it, saying simply that $X$ is more specific than $Y$.
 
-\subsection{Instances and metaclasses}
+\subsection{Instances and metaclasses} \label{sec:class.meta}
 
 A class defines the structure and behaviour of its \emph{instances}: run-time
 objects created (possibly) dynamically.  An instance is an instance of only
@@ -951,7 +944,14 @@ The predefined class @|SodClass| is the default metaclass for new classes.
 @|SodClass| has @|SodObject| as its only direct superclass.  @|SodClass| is
 its own metaclass.
 
-\subsection{Items and inheritance}
+To make matters more complicated, Sod has \emph{two} distinct metalevels: as
+well as the runtime metalevel, as discussed above, there's a compile-time
+metalevel hosted in the Sod translator.  Since Sod is written in Common Lisp,
+a Sod class's compile-time metaclass is a CLOS class.  The usual compile-time
+metaclass is @|sod-class|.  The compile-time metalevel is the subject of
+\xref{ch:api}.
+
+\subsection{Items and inheritance} \label{sec:class.inherit}
 
 A class definition also declares \emph{slots}, \emph{messages},
 \emph{initializers} and \emph{methods} -- collectively referred to as
@@ -967,7 +967,7 @@ multiple paths in the superclass graph from a class to one of its
 superclasses then items defined in that superclass may appear duplicated in
 the subclass.  Sod does not have this notion.
 
-\subsubsection{Slots}
+\subsubsection{Slots} \label{sec:class.inherit.slots}
 A \emph{slot} is a unit of state.  In other object systems, slots may be
 called `fields', `member variables', or `instance variables'.
 
@@ -981,10 +981,10 @@ At run-time, each instance of the class holds a separate value for each slot,
 whether direct or inherited.  Changing the value of an instance's slot
 doesn't affect other instances.
 
-\subsubsection{Initializers}
+\subsubsection{Initializers} \label{sec:class.inherit.init}
 Mumble.
 
-\subsubsection{Messages}
+\subsubsection{Messages} \label{sec:class.inherit.messages}
 A \emph{message} is the stimulus for behaviour.  In Sod, a class must define,
 statically, the name and format of the messages it is able to receive and the
 values it will return in reply.  In this respect, a message is similar to
@@ -999,7 +999,7 @@ each other or with direct messages, even if they have the same name.
 At run-time, one sends a message to an instance by invoking a function
 obtained from the instance's \emph{vtable}: \xref{sec:fixme-vtable}.
 
-\subsubsection{Methods}
+\subsubsection{Methods} \label{sec:class.inherit.methods}
 A \emph{method} is a unit of behaviour.  In other object systems, methods may
 be called `member functions'.
 
@@ -1009,7 +1009,193 @@ class or any of its superclasses are \emph{applicable}.  The details of how
 the applicable methods are invoked are described fully in
 \xref{sec:fixme-method-combination}.
 
-\subsection{Chains and instance layout}
+\subsection{Chains and instance layout} \label{sec:class.layout}
+
+C is a rather low-level language, and in particular it exposes details of the
+way data is laid out in memory.  Since an instance of a class~$C$ should be
+(at least in principle) usable anywhere an instance of some superclass $B
+\succeq C$ is expected, this implies that an instance of the subclass $C$
+needs to contain within it a complete instance of each superclass $B$, laid
+out according to the rules of instances of $B$, so that if we have (the
+address of) an instance of $C$, we can easily construct a pointer to a thing
+which looks like an instance of $B$ contained within it.
+
+Specifically, the information we need to retain for an instance of a
+class~$C$ is:
+\begin{itemize}
+\item the values of each of the slots defined by $C$, including those defined
+  by superclasses;
+\item information which will let us convert a pointer to $C$ into a pointer
+  to any superclass $B \succeq C$;
+\item information which will let us call the appropriate effective method for
+  each message defined by $C$, including those defined by superclasses; and
+\item some additional meta-level information, such as how to find the class
+  object for $C$ given (the address of) one of its instances.
+\end{itemize}
+
+Observe that, while each distinct instance must clearly have its own storage
+for slots, all instances of $C$ can share a single copy of the remaining
+information.  The individual instance only needs to keep a pointer to this
+shared table, which, inspired by the similar structure in many \Cplusplus\
+ABIs, are called a \emph{vtable}.
+
+The easiest approach would be to decide that instances of $C$ are exactly
+like instances of $B$, only with extra space at the end for the extra slots
+which $C$ defines over and above those already existing in $B$.  Conversion
+is then trivial: a pointer to an instance of $C$ can be converted to a
+pointer to an instance of some superclass $B$ simply by casting.  Even though
+the root class @|SodObject| doesn't have any slots at all, its instances will
+still need a vtable so that you can find its class object: the address of the
+vtable therefore needs to be at the very start of the instance structure.
+Again, a vtable for a superclass would have a vtable for each of its
+superclasses as a prefix, with new items added afterwards.
+
+This appealing approach works well for an object system which only permits
+single inheritance of both state and behaviour.  Alas, it breaks down when
+multiple inheritance is allowed: $C$ can be a subclass of both $B$ and $B'$,
+even though $B$ is not a subclass of $B'$, nor \emph{vice versa}; so, in
+general, $B$'s instance structure will not be a prefix of $B'$'s, nor will
+$B'$'s be a prefix of $B$'s, and therefore $C$ cannot have both $B$ and $B'$
+as a prefix.
+
+A (non-root) class may -- though need not -- have a distinguished \emph{link}
+superclass, which need not be a direct superclass.  Furthermore, each
+class~$C$ must satisfy the \emph{chain condition}: for any superclass $A$ of
+$C$, there can be at most one other superclass of $C$ whose link superclass
+is $A$.\footnote{%
+  That is, it's permitted for two classes $B$ and $B'$ to have the same link
+  superclass $A$, but $B$ and $B'$ can't then both be superclasses of the
+  same class $C$.} %
+Therefore, the links partition the superclasses of~$C$ into nice linear
+\emph{chains}, such that each superclass is a member of exactly one chain.
+If a class~$B$ has a link superclass~$A$, then $B$'s \emph{level} is one more
+than that of $A$; otherwise $B$ is called a \emph{chain head} and its level
+is zero.  If the classes in a chain are written in a list, chain head first,
+then the level of each class gives its index in the list.
+
+Chains therefore allow us to recover some of the linearity properties which
+made layout simple in the case of single inheritance.  The instance structure
+for a class $C$ contains a substructure for each of $C$'s superclass chains;
+a pointer to an object of class $C$ actually points to the substructure for
+the chain containing $C$.  The order of these substructures is unimportant
+for now.\footnote{%
+  The chains appear in the order in which their most specific classes appear
+  in $C$'s class precedence list.  This guarantees that the chain containing
+  $C$ itself appears first, so that a pointer to $C$'s instance structure is
+  actually a pointer to $C$'s chain substructure.  Apart from that, it's a
+  simple, stable, but basically arbitrary choice which can't be changed
+  without breaking the ABI.} %
+The substructure for each chain begins with a pointer to a vtable, followed
+by a structure for each superclass in the chain containing the slots defined
+by that superclass, with the chain head (least specific class) first.
+
+Suppose we have a pointer to (static) type $C$, and want to convert it into a
+pointer to some superclass $B$ of $C$ -- an \emph{upcast}.\footnote{%
+  In the more general case, we have a pointer to static type $C$, which
+  actually points to an object of some subclass $D$ of $C$, and want to
+  convert it into a pointer to type $B$.  Such a conversion is called a
+  \emph{downcast} if $B$ is a subclass of $C$, or a \emph{cross-cast}
+  otherwise.  Downcasts and cross-casts require complicated run-time
+  checking, and can will fail unless $B$ is a superclass of $D$.} %
+If $B$ is in the same chain as $C$ -- an \emph{in-chain upcast} -- then the
+pointer value is already correct and it's only necessary to cast it
+appropriately.  Otherwise -- a \emph{cross-chain upcast} -- the pointer needs
+to be adjusted to point to a different chain substructure.  Since the lengths
+and relative positions of the chain substructures vary between classes, the
+adjustments are stored in the vtable.  Cross-chain upcasts are therefore a
+bit slower than in-chain upcasts.
+
+Each chain has its own separate vtable, because much of the metadata stored
+in the vtable is specific to a particular chain.  For example:
+\begin{itemize}
+\item offsets to other chains' substructures will vary depending on which
+  chain we start from; and
+\item entry points to methods {
+
+%%%--------------------------------------------------------------------------
+\chapter{The Lisp programming interface} \label{ch:api}
+
+%% output for `h' files
+%%
+%% prologue
+%% guard start
+%% typedefs start
+%% typedefs
+%% typedefs end
+%% includes start
+%% includes
+%% includes end
+%% classes start
+%% CLASS banner
+%% CLASS islots start
+%% CLASS islots slots
+%% CLASS islots end
+%% CLASS vtmsgs start
+%% CLASS vtmsgs CLASS start
+%% CLASS vtmsgs CLASS slots
+%% CLASS vtmsgs CLASS end
+%% CLASS vtmsgs end
+%% CLASS vtables start
+%% CLASS vtables CHAIN-HEAD start
+%% CLASS vtables CHAIN-HEAD slots
+%% CLASS vtables CHAIN-HEAD end
+%% CLASS vtables end
+%% CLASS vtable-externs
+%% CLASS vtable-externs-after
+%% CLASS methods start
+%% CLASS methods
+%% CLASS methods end
+%% CLASS ichains start
+%% CLASS ichains CHAIN-HEAD start
+%% CLASS ichains CHAIN-HEAD slots
+%% CLASS ichains CHAIN-HEAD end
+%% CLASS ichains end
+%% CLASS ilayout start
+%% CLASS ilayout slots
+%% CLASS ilayout end
+%% CLASS conversions
+%% CLASS object
+%% classes end
+%% guard end
+%% epilogue
+
+%% output for `c' files
+%%
+%% prologue
+%% includes start
+%% includes
+%% includes end
+%% classes start
+%% CLASS banner
+%% CLASS direct-methods start
+%% CLASS direct-methods METHOD start
+%% CLASS direct-methods METHOD body
+%% CLASS direct-methods METHOD end
+%% CLASS direct-methods end
+%% CLASS effective-methods
+%% CLASS vtables start
+%% CLASS vtables CHAIN-HEAD start
+%% CLASS vtables CHAIN-HEAD class-pointer METACLASS
+%% CLASS vtables CHAIN-HEAD base-offset
+%% CLASS vtables CHAIN-HEAD chain-offset TARGET-HEAD
+%% CLASS vtables CHAIN-HEAD vtmsgs CLASS start
+%% CLASS vtables CHAIN-HEAD vtmsgs CLASS slots
+%% CLASS vtables CHAIN-HEAD vtmsgs CLASS end
+%% CLASS vtables CHAIN-HEAD end
+%% CLASS vtables end
+%% CLASS object prepare
+%% CLASS object start
+%% CLASS object CHAIN-HEAD ichain start
+%% CLASS object SUPER slots start
+%% CLASS object SUPER slots
+%% CLASS object SUPER vtable
+%% CLASS object SUPER slots end
+%% CLASS object CHAIN-HEAD ichain end
+%% CLASS object end
+%% classes end
+%% epilogue
+
+%%%--------------------------------------------------------------------------
 
 \include{sod-backg}
 \include{sod-protocol}