From: Mark Wooding Date: Sun, 30 Aug 2015 09:58:38 +0000 (+0100) Subject: doc/: Sort out the manual structure. Write stuff. X-Git-Url: https://git.distorted.org.uk/~mdw/sod/commitdiff_plain/1f7d590d9c7b87442c8d8b6424ed4f769d377692 doc/: Sort out the manual structure. Write stuff. Don't expect great commit messages for a while. --- diff --git a/doc/sod-protocol.tex b/doc/clang.tex similarity index 68% rename from doc/sod-protocol.tex rename to doc/clang.tex index 6ef829b..c26ece5 100644 --- a/doc/sod-protocol.tex +++ b/doc/clang.tex @@ -1,13 +1,13 @@ %%% -*-latex-*- %%% -%%% Description of the internal class structure and protocol +%%% C language utilities %%% -%%% (c) 2009 Straylight/Edgeware +%%% (c) 2015 Straylight/Edgeware %%% %%%----- Licensing notice --------------------------------------------------- %%% -%%% This file is part of the Simple Object Definition system. +%%% This file is part of the Sensble Object Design, an object system for C. %%% %%% SOD is free software; you can redistribute it and/or modify %%% it under the terms of the GNU General Public License as published by @@ -23,152 +23,12 @@ %%% along with SOD; if not, write to the Free Software Foundation, %%% Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. -\chapter{Protocol overview} \label{ch:proto} - -This chapter provides an overview of the Sod translator's internal object -model. It describes most of the important classes and generic functions, how -they are used to build a model of a Sod module and produce output code, and -how an extension might modify the translator's behaviour. - -I assume familiarity with the Common Lisp Object System (CLOS). Familiarity -with the CLOS Metaobject Protocol isn't necessary but may be instructive. - -%%%-------------------------------------------------------------------------- -\section{A tour through the translator} - -At the very highest level, the Sod translator works in two phases: it -\emph{parses} source files into an internal representation, and then it -\emph{generates} output files from the internal representation. - -The function @|read-module| is given a pathname for a file: it opens the -file, parses the program text, and returns a @|module| instance describing -the classes and other items found. - -At the other end, the main output function is @|output-module|, which is -given a module, an output stream and a - - -%%%-------------------------------------------------------------------------- -\section{Specification conventions} \label{sec:proto.conventions} - -Throughout this specification, the phrase `it is an error' indicates that a -particular circumstance is erroneous and results in unspecified and possibly -incorrect behaviour. In particular, the situation need not be immediately -diagnosed, and the consequences may be far-reaching. - -The following conventions apply throughout this specification. - -\begin{itemize} - -\item If a specification describes an argument as having a particular type or - syntax, then it is an error to provide an argument not having that - particular type or syntax. - -\item If a specification describes a function then that function might be - implemented as a generic function; it is an error to attempt to (re)define - it as a generic function, or to attempt to add methods to it. A function - specified as being a generic function will certainly be so; if user methods - are permitted on the generic function then this will be specified. - -\item Where a class precedence list is specified, either explicitly or - implicitly by a class hierarchy, the implementation may include additional - superclasses not specified here. Such additional superclasses will not - affect the order of specified classes in the class precedence lists either - of specified classes themselves or of user-defined subclasses of specified - classes. - -\item Unless otherwise specified, generic functions use the standard method - combination. - -\item The specifications for methods are frequently brief; they should be - read in conjunction with and in the context of the specification for the - generic function and specializing classes, if any. - -\item An object $o$ is a \emph{direct instance} of a class $c$ if @|(eq - (class-of $o$) $c$)|; $o$ is an \emph{instance} of $c$ if it is a direct - instance of any subclass of $c$. - -\item If a class is specified as being \emph{abstract} then it is an error to - construct direct instances of it, e.g., using @|make-instance|. - -\item If an object is specified as being \emph{immutable} then it is an error - to mutate it, e.g., using @|(setf (slot-value \ldots) \ldots)|. Programs - may rely on immutable objects retaining their state. - -\item A value is \emph{fresh} if it is guaranteed to be not @|eql| to any - previously existing value. - -\item Unless otherwise specified, it is an error to change the class of an - instance of any class described here; and it is an error to change the - class of an object to a class described here. - -\end{itemize} - -\subsection{Format of the entries} \label{sec:proto.conventions.format} - -Most symbols defined by the protocol have their own entries. An entry begins -with a header line, showing a synopsis of the symbol on the left, and the -category (function, class, macro, etc.) on the right. - -\begin{describe}{fun}{example-function @ - \&optional @ - \&rest @ - \&key :keyword} - The synopsis for a function, generic function or method describes the - function's lambda-list using the usual syntax. Note that keyword arguments - are shown by naming their keywords; in the description, the value passed - for the keyword argument @|:keyword| is shown as @. - - For a method, specializers are shown using the usual @|defmethod| syntax, - e.g., - \begin{quote} \sffamily - some-generic-function ((@ list) @) - \end{quote} -\end{describe} - -\begin{describe}{mac}{example-macro - ( @{ @ @! (@ @
) @}^* ) \\ \push - @[[ @^* @! @ @]] \\ - @^*} - The synopsis for a macro describes the acceptable syntax using the - following notation. - \begin{itemize} - \item Literal symbols, e.g., keywords and parenthesis, are shown in - @|monospace|. - \item Metasyntactic variables are shown in @. - \item Items are grouped together by braces `@{ $\dots$ @}'. The notation - `@{ $\dots$ @}^*' indicates that the enclosed items may be repeated zero - or more times; `@{ $\dots$ @}^+' indicates that the enclosed items may be - repeated one or more times. This notation may be applied to a single - item without the braces. - \item Optional items are shown enclosed in brackets `@[ $\dots$ @]'. - \item Alternatives are separated by vertical bars `@!'; the vertical bar - has low precedence, so alternatives extend as far as possible between - bars and up to the enclosing brackets if any. - \item A sequence of alternatives enclosed in double-brackets `@[[ $\ldots$ - @]]' indicates that the alternatives may occur in any order, but each may - appear at most once unless marked by a star. - \end{itemize} - For example, the notation at the head of this example describes syntax - for @|let|. -\end{describe} - -\begin{describe}{cls}{example-class (direct-super other-direct-super) \&key - :initarg} - The synopsis for a class lists the class's direct superclasses, and the - acceptable initargs in the form of a lambda-list. The initargs may be - passed to @|make-instance| when constructing an instance of the class or a - subclass of it. If instances of the class may be reinitialized, or if - objects can be changed to be instances of the class, then these initargs - may also be passed to @|reinitialize-instance| and/or @|change-class| as - applicable; the class description will state explicitly when these - operations are allowed. -\end{describe} +\chapter{C language utilities} \label{ch:clang} %%%-------------------------------------------------------------------------- -\section{C type representation} \label{sec:proto.c-types} +\section{C type representation} \label{sec:clang.c-types} -\subsection{Overview} \label{sec:proto.c-types.over} +\subsection{Overview} \label{sec:clang.c-types.over} The Sod translator represents C types in a fairly simple and direct way. However, because it spends a fair amount of its time dealing with C types, it @@ -178,11 +38,11 @@ The class hierarchy is shown in~\xref{fig:proto.c-types}. \begin{figure} \centering \parbox{10pt}{\begin{tabbing} - @|c-type| \\ \push - @|qualifiable-c-type| \\ \push - @|simple-c-type| \\ \push + @|c-type| \\ \ind + @|qualifiable-c-type| \\ \ind + @|simple-c-type| \\ \ind @|c-class-type| \- \\ - @|tagged-c-type| \\ \push + @|tagged-c-type| \\ \ind @|c-struct-type| \\ @|c-union-type| \\ @|c-enum-type| \- \\ @@ -228,7 +88,7 @@ similar names. Neither generic function defines a default primary method; subclasses of @|c-type| must define their own methods in order to print correctly. -\subsection{The C type root class} \label{sec:proto.c-types.root} +\subsection{The C type root class} \label{sec:clang.c-types.root} \begin{describe}{cls}{c-type ()} The class @|c-type| marks the root of the built-in C type hierarchy. @@ -240,7 +100,7 @@ Neither generic function defines a default primary method; subclasses of The class @|c-type| is abstract. \end{describe} -\subsection{C type S-expression notation} \label{sec:proto.c-types.sexp} +\subsection{C type S-expression notation} \label{sec:clang.c-types.sexp} The S-expression representation of a type is described syntactically as a type specifier. Type specifiers fit into two syntactic categories. @@ -254,13 +114,14 @@ type specifier. Type specifiers fit into two syntactic categories. arguments to the type operator. \end{itemize} -\begin{describe}{mac}{c-type @ @to @} +\begin{describe}{mac}{c-type @ @> @} Evaluates to a C type object, as described by the type specifier @. \end{describe} -\begin{describe}{mac}{ - defctype @{ @ @! (@^*) @} @ @to @} +\begin{describe}{mac} + {defctype @{ @ @! (@ @^*) @} @ + @> @} Defines a new symbolic type specifier @; if a list of @s is given, then all are defined in the same way. The type constructed by using any of the @s is as described by the type specifier @. @@ -270,16 +131,14 @@ type specifier. Type specifiers fit into two syntactic categories. @ is used in a type specifier. \end{describe} -\begin{describe}{mac}{c-type-alias @ @^* @to @} +\begin{describe}{mac}{c-type-alias @ @^* @> @} Defines each @ as being a type operator identical in behaviour to @. If @ is later redefined then the behaviour of the @es changes too. \end{describe} -\begin{describe}{mac}{% - define-c-type-syntax @ @ \\ \push - @^* \-\\ - @to @} +\begin{describe}{mac} + {define-c-type-syntax @ @ @^* @> @} Defines the symbol @ as a new type operator. When a list of the form @|(@ @^*)| is used as a type specifier, the @s are bound to fresh variables according to @ (a destructuring @@ -291,12 +150,12 @@ type specifier. Type specifiers fit into two syntactic categories. type specifiers among its arguments. \end{describe} -\begin{describe}{fun}{expand-c-type-spec @ @to @} +\begin{describe}{fun}{expand-c-type-spec @ @> @} Returns the Lisp form that @|(c-type @)| would expand into. \end{describe} -\begin{describe}{gf}{% - print-c-type @ @ \&optional @ @} +\begin{describe}{gf} + {print-c-type @ @ \&optional @ @} Print the C type object @ to @ in S-expression form. The @ and @ arguments may be interpreted in any way which seems appropriate: they are provided so that @|print-c-type| may be called via @@ -307,14 +166,15 @@ type specifier. Type specifiers fit into two syntactic categories. default method. \end{describe} -\subsection{Comparing C types} \label{sec:proto.c-types.cmp} +\subsection{Comparing C types} \label{sec:clang.c-types.cmp} It is necessary to compare C types for equality, for example when checking argument lists for methods. This is done by @|c-type-equal-p|. -\begin{describe}{gf}{c-type-equal-p @_1 @_2 @to @} - The generic function @|c-type-equal-p| compares two C types @_1 and - @_2 for equality; it returns true if the two types are equal and +\begin{describe}{gf} + {c-type-equal-p @_1 @_2 @> @} + The generic function @|c-type-equal-p| compares two C types @_1 and + @_2 for equality; it returns true if the two types are equal and false if they are not. Two types are equal if they are structurally similar, where this property @@ -323,24 +183,24 @@ argument lists for methods. This is done by @|c-type-equal-p|. The generic function @|c-type-equal-p| uses the @|and| method combination. - \begin{describe}{meth}{c-type-equal-p @_1 @_2} + \begin{describe}{meth}{c-type-equal-p @_1 @_2} A default primary method for @|c-type-equal-p| is defined. It simply returns @|nil|. This way, methods can specialize on both arguments without fear that a call will fail because no methods are applicable. \end{describe} - \begin{describe}{ar-meth}{c-type-equal-p @_1 @_2} + \begin{describe}{ar-meth}{c-type-equal-p @_1 @_2} A default around-method for @|c-type-equal-p| is defined. It returns - true if @_1 and @_2 are @|eql|; otherwise it delegates to the - primary methods. Since several common kinds of C types are interned, + true if @_1 and @_2 are @|eql|; otherwise it delegates to + the primary methods. Since several common kinds of C types are interned, this is a common case worth optimizing. \end{describe} \end{describe} -\subsection{Outputting C types} \label{sec:proto.c-types.output} +\subsection{Outputting C types} \label{sec:clang.c-types.output} -\begin{describe}{gf}{pprint-c-type @ @ @} +\begin{describe}{gf}{pprint-c-type @ @ @} The generic function @|pprint-c-type| pretty-prints to @ a C-syntax - declaration of an object or function of type @. The result is + declaration of an object or function of type @. The result is written to @. A C declaration has two parts: a sequence of \emph{declaration specifiers} @@ -385,7 +245,7 @@ argument lists for methods. This is done by @|c-type-equal-p|. Every concrete subclass of @|c-type| is expected to provide a primary method on this function. There is no default primary method. - \begin{describe}{ar-meth}{pprint-c-type @ @ @} + \begin{describe}{ar-meth}{pprint-c-type @ @ @} A default around method is defined on @|pprint-c-type| which `canonifies' non-function @ arguments. In particular: \begin{itemize} @@ -404,9 +264,8 @@ argument lists for methods. This is done by @|c-type-equal-p|. specifiers. The precise details are subject to change. \end{describe} -\begin{describe}{mac}{% - maybe-in-parens (@ @) \\ \push - @^*} +\begin{describe}{mac} + {maybe-in-parens (@ @) @^*} The @ is evaluated, and then the @s are evaluated in sequence within a pretty-printer logical block writing to the stream named by the symbol @. If the @ evaluates to nil, then @@ -419,7 +278,7 @@ argument lists for methods. This is done by @|c-type-equal-p|. \end{describe} \subsection{Type qualifiers and qualifiable types} -\label{sec:proto.ctypes.qual} +\label{sec:clang.ctypes.qual} \begin{describe}{cls}{qualifiable-c-type (c-type) \&key :qualifiers} The class @|qualifiable-c-type| describes C types which can bear @@ -437,18 +296,18 @@ argument lists for methods. This is done by @|c-type-equal-p|. The class @|qualifiable-c-type| is abstract. \end{describe} -\begin{describe}{gf}{c-type-qualifiers @ @to @} - Returns the qualifiers of the @|qualifiable-c-type| instance @ as an - immutable list. +\begin{describe}{gf}{c-type-qualifiers @ @> @} + Returns the qualifiers of the @|qualifiable-c-type| instance @ as + an immutable list. \end{describe} -\begin{describe}{fun}{qualify-type @ @} - The argument @ must be an instance of @|qualifiable-c-type|, +\begin{describe}{fun}{qualify-type @ @ @> @} + The argument @ must be an instance of @|qualifiable-c-type|, currently bearing no qualifiers, and @ a list of qualifier keywords. The result is a C type object like @ except that it bears the given @. - The @ is not modified. If @ is interned, then the returned + The @ is not modified. If @ is interned, then the returned type will be interned. \end{describe} @@ -458,7 +317,7 @@ argument lists for methods. This is done by @|c-type-equal-p|. non-null then the final character of the returned string will be a space. \end{describe} -\subsection{Leaf types} \label{sec:proto.c-types.leaf} +\subsection{Leaf types} \label{sec:clang.c-types.leaf} A \emph{leaf type} is a type which is not defined in terms of another type. In Sod, the leaf types are @@ -522,19 +381,20 @@ In Sod, the leaf types are \label{tab:proto.c-types.simple} \end{table} -\begin{describe}{fun}{make-simple-type @ \&optional @} +\begin{describe}{fun} + {make-simple-type @ \&optional @ @> @} Return the (unique interned) simple C type object for the C type whose name is @ (a string) and which has the given @ (a list of keywords). \end{describe} -\begin{describe}{gf}{c-type-name @} - Returns the name of a @|simple-c-type| instance @ as an immutable +\begin{describe}{gf}{c-type-name @ @> @} + Returns the name of a @|simple-c-type| instance @ as an immutable string. \end{describe} -\begin{describe}{mac}{% - define-simple-c-type @{ @ @! (@^*) @} @} +\begin{describe}{mac} + {define-simple-c-type @{ @ @! (@^*) @} @ @> @} Define type specifiers for a new simple C type. Each symbol @ is defined as a symbolic type specifier for the (unique interned) simple C type whose name is the value of @. Further, each @ is @@ -559,13 +419,22 @@ In Sod, the leaf types are structs and unions. \end{boxy} -\begin{describe}{gf}{c-tagged-type-kind @} - Returns a symbol classifying the tagged @: one of @|enum|, @|struct| - or @|union|. User-defined subclasses of @|tagged-c-type| should return - their own classification symbols. It is intended that @|(string-downcase - (c-tagged-type-kind @))| be valid C syntax.\footnote{% +\begin{describe}{gf}{c-tagged-type-kind @ @> @} + Returns a keyword classifying the tagged @: one of @|:enum|, + @|:struct| or @|:union|. User-defined subclasses of @|tagged-c-type| + should return their own classification symbols. It is intended that + @|(string-downcase (c-tagged-type-kind @))| be valid C + syntax.\footnote{% Alas, C doesn't provide a syntactic category for these keywords; \Cplusplus\ calls them a @.} % + There is a method defined for each of the built-in tagged type classes + @|c-struct-type|, @|c-union-type| and @|c-enum-type|. +\end{describe} + +\begin{describe}{gf}{kind-c-tagged-type @ @> @} + This is not quite the inverse of @|c-tagged-type-kind|. Given a keyword + naming a kind of tagged type, return the name of the corresponding C + type class as a symbol. \end{describe} \begin{describe}{cls}{c-enum-type (tagged-c-type) \&key :qualifiers :tag} @@ -576,7 +445,8 @@ In Sod, the leaf types are interned) enumerated type with the given @ and @s (all evaluated). \end{describe} -\begin{describe}{fun}{make-enum-type @ \&optional @} +\begin{describe}{fun} + {make-enum-type @ \&optional @ @> @} Return the (unique interned) C type object for the enumerated C type whose tag is @ (a string) and which has the given @ (a list of keywords). @@ -590,7 +460,8 @@ In Sod, the leaf types are interned) structured type with the given @ and @s (all evaluated). \end{describe} -\begin{describe}{fun}{make-struct-type @ \&optional @} +\begin{describe}{fun} + {make-struct-type @ \&optional @ @> @} Return the (unique interned) C type object for the structured C type whose tag is @ (a string) and which has the given @ (a list of keywords). @@ -605,24 +476,31 @@ In Sod, the leaf types are interned) union type with the given @ and @s (all evaluated). \end{describe} -\begin{describe}{fun}{make-union-type @ \&optional @} +\begin{describe}{fun} + {make-union-type @ \&optional @ @> @} Return the (unique interned) C type object for the union C type whose tag is @ (a string) and which has the given @ (a list of keywords). \end{describe} -\subsection{Pointers and arrays} \label{sec:proto.c-types.ptr-array} +\subsection{Compound C types} \label{sec:code.c-types.compound} + +Some C types are \emph{compound types}: they're defined in terms of existing +types. The classes which represent compound types implement a common +protocol. -Pointers and arrays are \emph{compound types}: they're defined in terms of -existing types. A pointer describes the type of objects it points to; an -array describes the type of array element. -\begin{describe}{gf}{c-type-subtype @} - Returns the underlying type of a compound type @. Precisely what - this means depends on the class of @. +\begin{describe}{gf}{c-type-subtype @ @> @} + Returns the underlying type of a compound type @. Precisely what + this means depends on the class of @. \end{describe} -\begin{describe}{cls}{c-pointer-type (qualifiable-c-type) - \&key :qualifiers :subtype} +\subsection{Pointer types} \label{sec:clang.c-types.pointer} + +Pointers compound types. The subtype of a pointer type is the type it points +to. + +\begin{describe}{cls} + {c-pointer-type (qualifiable-c-type) \&key :qualifiers :subtype} Represents a C pointer type. An instance denotes the C type @ @|*|@. @@ -639,12 +517,20 @@ array describes the type of array element. characters; the symbol @|const-string| is a type specifier for the type pointer to constant characters. \end{describe} -\begin{describe}{fun}{make-pointer-type @ \&optional @} + +\begin{describe}{fun} + {make-pointer-type @ \&optional @ + @> @} Return an object describing the type of qualified pointers to @. If @ is interned, then the returned pointer type object is interned also. \end{describe} +\subsection{Array types} \label{sec:clang.c-types.array} + +Arrays implement the compound-type protocol. The subtype of an array is the +array element type. + \begin{describe}{cls}{c-array-type (c-type) \&key :subtype :dimensions} Represents a multidimensional C array type. The @ are a list of dimension specifiers $d_0$, $d_1$, \ldots, $d_{n-1}$; an instance then @@ -668,23 +554,66 @@ array describes the type of array element. single-dimensional array with unspecified extent. The synonyms @|array| and @|vector| may be used in place of the brackets @`[]'. \end{describe} -\begin{describe}{fun}{make-array-type @ @} + +\begin{describe}{fun} + {make-array-type @ @ @> @} Return an object describing the type of arrays with given @ and with element type @ (an instance of @|c-type|). The @ argument is a list whose elements are strings or nil; see the description of the class @|c-array-type| above for details. \end{describe} -\begin{describe}{gf}{c-array-dimensions @} - Returns the dimensions of @, an array type, as an immutable list. + +\begin{describe}{gf}{c-array-dimensions @ @> @} + Returns the dimensions of @, an array type, as an immutable list. +\end{describe} + +\subsection{Function types} \label{sec:clang.c-types.fun} + +\begin{describe}{cls}{argument} +\end{describe} + +\begin{describe}{fun}{argumentp @ @> @} +\end{describe} + +\begin{describe}{fun}{make-argument @ @ @> @} +\end{describe} + +\begin{describe}{fun}{argument-name @ @> @} +\end{describe} + +\begin{describe}{fun}{argument-type @ @> @} \end{describe} -\subsection{Function types} \label{sec:proto.c-types.fun} +\begin{describe}{fun} + {commentify-argument-name @ @> @} +\end{describe} \begin{describe}{cls}{c-function-type (c-type) \&key :subtype :arguments} Represents C function types. An instance denotes the C type of a C - function which + function which FIXME +\end{describe} + +\begin{describe}{fun} + {c-function-arguments @ @> @} +\end{describe} + +\begin{describe}{fun} + {make-c-type @ @ @> @} +\end{describe} + +\begin{describe}{fun} + {commentify-argument-names @ @> @} +\end{describe} + +\begin{describe}{fun} + {commentify-function-type @ @> @} \end{describe} +\subsection{Parsing C types} \label{sec:clang.c-types.parsing} + +%%%-------------------------------------------------------------------------- +\section{Generating C code} \label{sec:clang.codegen} + %%%----- That's all, folks -------------------------------------------------- %%% Local variables: diff --git a/doc/concepts.tex b/doc/concepts.tex new file mode 100644 index 0000000..d554b51 --- /dev/null +++ b/doc/concepts.tex @@ -0,0 +1,42 @@ +%%% -*-latex-*- +%%% +%%% Conceptual background +%%% +%%% (c) 2015 Straylight/Edgeware +%%% + +%%%----- Licensing notice --------------------------------------------------- +%%% +%%% This file is part of the Sensble Object Design, an object system for C. +%%% +%%% SOD is free software; you can redistribute it and/or modify +%%% it under the terms of the GNU General Public License as published by +%%% the Free Software Foundation; either version 2 of the License, or +%%% (at your option) any later version. +%%% +%%% SOD is distributed in the hope that it will be useful, +%%% but WITHOUT ANY WARRANTY; without even the implied warranty of +%%% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +%%% GNU General Public License for more details. +%%% +%%% You should have received a copy of the GNU General Public License +%%% along with SOD; if not, write to the Free Software Foundation, +%%% Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + +\chapter{Concepts} + +\section{Classes and slots} + +\section{Messages and methods} + +\section{Metaclasses} + +\section{Modules} + +%%%----- That's all, folks -------------------------------------------------- + +%%% Local variables: +%%% mode: LaTeX +%%% TeX-master: "sod.tex" +%%% TeX-PDF-mode: t +%%% End: diff --git a/doc/cutting-room-floor.tex b/doc/cutting-room-floor.tex new file mode 100644 index 0000000..c8f241b --- /dev/null +++ b/doc/cutting-room-floor.tex @@ -0,0 +1,489 @@ +%%% -*-latex-*- +%%% +%%% Conceptual background +%%% +%%% (c) 2015 Straylight/Edgeware +%%% + +%%%----- Licensing notice --------------------------------------------------- +%%% +%%% This file is part of the Sensble Object Design, an object system for C. +%%% +%%% SOD is free software; you can redistribute it and/or modify +%%% it under the terms of the GNU General Public License as published by +%%% the Free Software Foundation; either version 2 of the License, or +%%% (at your option) any later version. +%%% +%%% SOD is distributed in the hope that it will be useful, +%%% but WITHOUT ANY WARRANTY; without even the implied warranty of +%%% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +%%% GNU General Public License for more details. +%%% +%%% You should have received a copy of the GNU General Public License +%%% along with SOD; if not, write to the Free Software Foundation, +%%% Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + +\chapter{Cutting-room floor} + +%%%-------------------------------------------------------------------------- +\section{Generated names} + +The generated names for functions and objects related to a class are +constructed systematically so as not to interfere with each other. The rules +on class, slot and message naming exist so as to ensure that the generated +names don't collide with each other. + +The following notation is used in this section. +\begin{description} +\item[@] The full name of the `focus' class: the one for which we are + generating name. +\item[@] The nickname of a superclass. +\item[@] The nickname of the chain-head class of the chain + in question. +\end{description} + +\subsection{Instance layout} + +%%%-------------------------------------------------------------------------- +\section{Class objects} + +\begin{listing} +typedef struct SodClass__ichain_obj SodClass; + +struct sod_chain { + size_t n_classes; /* Number of classes in chain */ + const SodClass *const *classes; /* Vector of classes, head first */ + size_t off_ichain; /* Offset of ichain from instance base */ + const struct sod_vtable *vt; /* Vtable pointer for chain */ + size_t ichainsz; /* Size of the ichain structure */ +}; + +struct sod_vtable { + SodClass *_class; /* Pointer to instance's class */ + size_t _base; /* Offset to instance base */ +}; + +struct SodClass__islots { + + /* Basic information */ + const char *name; /* The class's name as a string */ + const char *nick; /* The nickname as a string */ + + /* Instance allocation and initialization */ + size_t instsz; /* Instance layout size in bytes */ + void *(*imprint)(void *); /* Stamp instance with vtable ptrs */ + void *(*init)(void *); /* Initialize instance */ + + /* Superclass structure */ + size_t n_supers; /* Number of direct superclasses */ + const SodClass *const *supers; /* Vector of direct superclasses */ + size_t n_cpl; /* Length of class precedence list */ + const SodClass *const *cpl; /* Vector for class precedence list */ + + /* Chain structure */ + const SodClass *link; /* Link to next class in chain */ + const SodClass *head; /* Pointer to head of chain */ + size_t level; /* Index of class in its chain */ + size_t n_chains; /* Number of superclass chains */ + const sod_chain *chains; /* Vector of chain structures */ + + /* Layout */ + size_t off_islots; /* Offset of islots from ichain base */ + size_t islotsz; /* Size of instance slots */ +}; + +struct SodClass__ichain_obj { + const SodClass__vt_obj *_vt; + struct SodClass__islots cls; +}; + +struct sod_instance { + struct sod_vtable *_vt; +}; +\end{listing} + +\begin{listing} +void *sod_convert(const SodClass *cls, const void *obj) +{ + const struct sod_instance *inst = obj; + const SodClass *real = inst->_vt->_cls; + const struct sod_chain *chain; + size_t i, index; + + for (i = 0; i < real->cls.n_chains; i++) { + chain = &real->cls.chains[i]; + if (chain->classes[0] == cls->cls.head) { + index = cls->cls.index; + if (index < chain->n_classes && chain->classes[index] == cls) + return ((char *)cls - inst->_vt._base + chain->off_ichain); + else + return (0); + } + } + return (0); +} +\end{listing} + +%%%-------------------------------------------------------------------------- +\section{Classes} +\label{sec:class} + +\subsection{Classes and superclasses} \label{sec:class.defs} + +A @ must list one or more existing classes to be the +\emph{direct superclasses} for the new class being defined. We make the +following definitions. +\begin{itemize} +\item The \emph{superclasses} of a class consist of the class itself together + with the superclasses of its direct superclasses. +\item The \emph{proper superclasses} of a class are its superclasses other + than itself. +\item If $C$ is a (proper) superclass of $D$ then $D$ is a (\emph{proper}) + \emph{subclass} of $C$. +\end{itemize} +The predefined class @|SodObject| has no direct superclasses; it is unique in +this respect. All classes are subclasses of @|SodObject|. + +\subsection{The class precedence list} \label{sec:class.cpl} + +Let $C$ be a class. The superclasses of $C$ form a directed graph, with an +edge from each class to each of its direct superclasses. This is the +\emph{superclass graph of $C$}. + +In order to resolve inheritance of items, we define a \emph{class precedence + list} (or CPL) for each class, which imposes a total order on that class's +superclasses. The default algorithm for computing the CPL is the \emph{C3} +algorithm \cite{fixme-c3}, though extensions may implement other algorithms. + +The default algorithm works as follows. Let $C$ be the class whose CPL we +are to compute. Let $X$ and $Y$ be two of $C$'s superclasses. +\begin{itemize} +\item $C$ must appear first in the CPL. +\item If $X$ appears before $Y$ in the CPL of one of $C$'s direct + superclasses, then $X$ appears before $Y$ in the $C$'s CPL. +\item If the above rules don't suffice to order $X$ and $Y$, then whichever + of $X$ and $Y$ has a subclass which appears further left in the list of + $C$'s direct superclasses will appear earlier in the CPL. +\end{itemize} +This last rule is sufficient to disambiguate because if both $X$ and $Y$ are +superclasses of the same direct superclass of $C$ then that direct +superclass's CPL will order $X$ and $Y$. + +We say that \emph{$X$ is more specific than $Y$ as a superclass of $C$} if +$X$ is earlier than $Y$ in $C$'s class precedence list. If $C$ is clear from +context then we omit it, saying simply that $X$ is more specific than $Y$. + +\subsection{Instances and metaclasses} \label{sec:class.meta} + +A class defines the structure and behaviour of its \emph{instances}: run-time +objects created (possibly) dynamically. An instance is an instance of only +one class, though structurally it may be used in place of an instance of any +of that class's superclasses. It is possible, with care, to change the class +of an instance at run-time. + +Classes are themselves represented as instances -- called \emph{class + objects} -- in the running program. Being instances, they have a class, +called the \emph{metaclass}. The metaclass defines the structure and +behaviour of the class object. + +The predefined class @|SodClass| is the default metaclass for new classes. +@|SodClass| has @|SodObject| as its only direct superclass. @|SodClass| is +its own metaclass. + +To make matters more complicated, Sod has \emph{two} distinct metalevels: as +well as the runtime metalevel, as discussed above, there's a compile-time +metalevel hosted in the Sod translator. Since Sod is written in Common Lisp, +a Sod class's compile-time metaclass is a CLOS class. The usual compile-time +metaclass is @|sod-class|. The compile-time metalevel is the subject of +\xref{ch:api}. + +\subsection{Items and inheritance} \label{sec:class.inherit} + +A class definition also declares \emph{slots}, \emph{messages}, +\emph{initializers} and \emph{methods} -- collectively referred to as +\emph{items}. In addition to the items declared in the class definition -- +the class's \emph{direct items} -- a class also \emph{inherits} items from +its superclasses. + +The precise rules for item inheritance vary according to the kinds of items +involved. + +Some object systems have a notion of `repeated inheritance': if there are +multiple paths in the superclass graph from a class to one of its +superclasses then items defined in that superclass may appear duplicated in +the subclass. Sod does not have this notion. + +\subsubsection{Slots} \label{sec:class.inherit.slots} +A \emph{slot} is a unit of state. In other object systems, slots may be +called `fields', `member variables', or `instance variables'. + +A slot has a \emph{name} and a \emph{type}. The name serves only to +distinguish the slot from other direct slots defined by the same class. A +class inherits all of its proper superclasses' slots. Slots inherited from +superclasses do not conflict with each other or with direct slots, even if +they have the same names. + +At run-time, each instance of the class holds a separate value for each slot, +whether direct or inherited. Changing the value of an instance's slot +doesn't affect other instances. + +\subsubsection{Initializers} \label{sec:class.inherit.init} +Mumble. + +\subsubsection{Messages} \label{sec:class.inherit.messages} +A \emph{message} is the stimulus for behaviour. In Sod, a class must define, +statically, the name and format of the messages it is able to receive and the +values it will return in reply. In this respect, a message is similar to +`abstract member functions' or `interface member functions' in other object +systems. + +Like slots, a message has a \emph{name} and a \emph{type}. Again, the name +serves only to distinguish the message from other direct messages defined by +the same class. Messages inherited from superclasses do not conflict with +each other or with direct messages, even if they have the same name. + +At run-time, one sends a message to an instance by invoking a function +obtained from the instance's \emph{vtable}: \xref{sec:fixme-vtable}. + +\subsubsection{Methods} \label{sec:class.inherit.methods} +A \emph{method} is a unit of behaviour. In other object systems, methods may +be called `member functions'. + +A method is associated with a message. When a message is received by an +instance, all of the methods associated with that message on the instance's +class or any of its superclasses are \emph{applicable}. The details of how +the applicable methods are invoked are described fully in +\xref{sec:fixme-method-combination}. + +\subsection{Chains and instance layout} \label{sec:class.layout} + +C is a rather low-level language, and in particular it exposes details of the +way data is laid out in memory. Since an instance of a class~$C$ should be +(at least in principle) usable anywhere an instance of some superclass $B +\succeq C$ is expected, this implies that an instance of the subclass $C$ +needs to contain within it a complete instance of each superclass $B$, laid +out according to the rules of instances of $B$, so that if we have (the +address of) an instance of $C$, we can easily construct a pointer to a thing +which looks like an instance of $B$ contained within it. + +Specifically, the information we need to retain for an instance of a +class~$C$ is: +\begin{itemize} +\item the values of each of the slots defined by $C$, including those defined + by superclasses; +\item information which will let us convert a pointer to $C$ into a pointer + to any superclass $B \succeq C$; +\item information which will let us call the appropriate effective method for + each message defined by $C$, including those defined by superclasses; and +\item some additional meta-level information, such as how to find the class + object for $C$ given (the address of) one of its instances. +\end{itemize} + +Observe that, while each distinct instance must clearly have its own storage +for slots, all instances of $C$ can share a single copy of the remaining +information. The individual instance only needs to keep a pointer to this +shared table, which, inspired by the similar structure in many \Cplusplus\ +ABIs, are called a \emph{vtable}. + +The easiest approach would be to decide that instances of $C$ are exactly +like instances of $B$, only with extra space at the end for the extra slots +which $C$ defines over and above those already existing in $B$. Conversion +is then trivial: a pointer to an instance of $C$ can be converted to a +pointer to an instance of some superclass $B$ simply by casting. Even though +the root class @|SodObject| doesn't have any slots at all, its instances will +still need a vtable so that you can find its class object: the address of the +vtable therefore needs to be at the very start of the instance structure. +Again, a vtable for a superclass would have a vtable for each of its +superclasses as a prefix, with new items added afterwards. + +This appealing approach works well for an object system which only permits +single inheritance of both state and behaviour. Alas, it breaks down when +multiple inheritance is allowed: $C$ can be a subclass of both $B$ and $B'$, +even though $B$ is not a subclass of $B'$, nor \emph{vice versa}; so, in +general, $B$'s instance structure will not be a prefix of $B'$'s, nor will +$B'$'s be a prefix of $B$'s, and therefore $C$ cannot have both $B$ and $B'$ +as a prefix. + +A (non-root) class may -- though need not -- have a distinguished \emph{link} +superclass, which need not be a direct superclass. Furthermore, each +class~$C$ must satisfy the \emph{chain condition}: for any superclass $A$ of +$C$, there can be at most one other superclass of $C$ whose link superclass +is $A$.\footnote{% + That is, it's permitted for two classes $B$ and $B'$ to have the same link + superclass $A$, but $B$ and $B'$ can't then both be superclasses of the + same class $C$.} % +Therefore, the links partition the superclasses of~$C$ into nice linear +\emph{chains}, such that each superclass is a member of exactly one chain. +If a class~$B$ has a link superclass~$A$, then $B$'s \emph{level} is one more +than that of $A$; otherwise $B$ is called a \emph{chain head} and its level +is zero. If the classes in a chain are written in a list, chain head first, +then the level of each class gives its index in the list. + +Chains therefore allow us to recover some of the linearity properties which +made layout simple in the case of single inheritance. The instance structure +for a class $C$ contains a substructure for each of $C$'s superclass chains; +a pointer to an object of class $C$ actually points to the substructure for +the chain containing $C$. The order of these substructures is unimportant +for now.\footnote{% + The chains appear in the order in which their most specific classes appear + in $C$'s class precedence list. This guarantees that the chain containing + $C$ itself appears first, so that a pointer to $C$'s instance structure is + actually a pointer to $C$'s chain substructure. Apart from that, it's a + simple, stable, but basically arbitrary choice which can't be changed + without breaking the ABI.} % +The substructure for each chain begins with a pointer to a vtable, followed +by a structure for each superclass in the chain containing the slots defined +by that superclass, with the chain head (least specific class) first. + +Suppose we have a pointer to (static) type $C$, and want to convert it into a +pointer to some superclass $B$ of $C$ -- an \emph{upcast}.\footnote{% + In the more general case, we have a pointer to static type $C$, which + actually points to an object of some subclass $D$ of $C$, and want to + convert it into a pointer to type $B$. Such a conversion is called a + \emph{downcast} if $B$ is a subclass of $C$, or a \emph{cross-cast} + otherwise. Downcasts and cross-casts require complicated run-time + checking, and can will fail unless $B$ is a superclass of $D$.} % +If $B$ is in the same chain as $C$ -- an \emph{in-chain upcast} -- then the +pointer value is already correct and it's only necessary to cast it +appropriately. Otherwise -- a \emph{cross-chain upcast} -- the pointer needs +to be adjusted to point to a different chain substructure. Since the lengths +and relative positions of the chain substructures vary between classes, the +adjustments are stored in the vtable. Cross-chain upcasts are therefore a +bit slower than in-chain upcasts. + +Each chain has its own separate vtable, because much of the metadata stored +in the vtable is specific to a particular chain. For example: +\begin{itemize} +\item offsets to other chains' substructures will vary depending on which + chain we start from; and +\item entry points to methods +\end{itemize} +%%%-------------------------------------------------------------------------- +\section{Superclass linearization} + +Before making any decisions about relationships between superclasses, Sod +\emph{linearizes} them, i.e., imposes a total order consistent with the +direct-subclass/superclass partial order. + +In the vague hope that we don't be completely bogged down in formalism by the +end of this, let's introduce some notation. We'll fix some class $z$ and +consider its set of superclasses $S(z) = \{ a, b, \dots \}$. We can define a +relation $c \prec_1 d$ if $c$ is a direct subclass of $d$, and extend it by +taking the reflexive, transitive closure: $c \preceq d$ if and only if +\begin{itemize} +\item $c = d$, or +\item there exists some class $x$ such that $c \prec_1 x$ and $x \preceq d$. +\end{itemize} +This is the `is-subclass-of' relation we've been using so far.\footnote{% + In some object systems, notably Flavors, this relation is allowed to fail + to be a partial order because of cycles in the class graph. I haven't + given a great deal of thought to how well Sod would cope with a cyclic + class graph.} % +We write $d \succeq c$ and say that $d$ is a superclass of $c$ if and only if +$c \preceq d$. + +The problem comes when we try to resolve inheritance questions. A class +should inherit behaviour from its superclasses; but, in a world of multiple +inheritance, which one do we choose? We get a simple version of this problem +when we try to resolve inheritance of slot initializers: only one initializer +can be inherited. + +We start by collecting into a set~$I$ the classes which define an initializer +for the slot. If $I$ contains both a class $x$ and one of $x$'s superclasses +then we should prefer $x$ and consider the superclass to be overridden. So +we should confine our attention to \emph{least} classes: a member $x$ of a +set $I$ is least, with respect to a particular partial order, if $y \preceq +x$ only when $x = y$. If there is a single least class in our set the we +have a winner. Otherwise we want some way to choose among them. + +This is not uncontroversial. Languages such as \Cplusplus\ refuse to choose +among least classes; instead, any program in which such a choice must be made +is simply declared erroneous. + +Simply throwing up our hands in horror at this situation is satisfactory when +we only wanted to pick one `winner', as we do for slot initializers. +However, method combination is a much more complicated business. We don't +want to pick just one winner: we want to order all of the applicable methods +in some way. Insisting that there is a clear winner at every step along the +chain is too much of an imposition. Instead, we \emph{linearize} the +classes. + +%%%-------------------------------------------------------------------------- +\section{Invariance, covariance, contravariance} + +In Sod, at least with regard to the existing method combinations, method +types are \emph{invariant}. This is not an accident, and it's not due to +ignorance. + +The \emph{signature} of a function, method or message describes its argument +and return-value types. If a method's arguments are an integer and a string, +and it returns a character, we might write its signature as +\[ (@|int|, @|string|) \to @|char| \] +In Sod, a method's arguments have to match its message's arguments precisely, +and the return type must either be @|void| -- for a dæmon method -- or again +match the message's return type. This is argument and return-type +\emph{invariance}. + +Some object systems allow methods with subtly different signatures to be +defined on a single message. In particular, since the idea is that instances +of a subclass ought to be broadly compatible~(see \xref{sec:phil.lsp}) with +existing code which expects instances of a superclass, we might be able to +get away with bending method signatures one way or another to permit this. + +\Cplusplus\ permits \emph{return-type covariance}, where a method's return +type can be a subclass of the return type specified by a less-specific +method. Eiffel allows \emph{argument covariance}, where a method's arguments +can be subclasses of the arguments specified by a less-specific +method.\footnote{% + Attentive readers will note that I ought to be talking about pointers to + instances throughout. I'm trying to limit the weight of the notation. + Besides, I prefer data models as found in Lisp and Python where all values + are held by reference.} % + +Eiffel's argument covariance is unsafe.\footnote{% + Argument covariance is correct if you're doing runtime dispatch based on + argument types. Eiffel isn't: it's single dispatch, like Sod is.} % +Suppose that we have two pairs of classes, $a \prec_1 b$ and $c \prec_1 d$. +Class $b$ defines a message $m$ with signature $d \to @|int|$; class $a$ +defines a method with signature $c \to @|int|$. This means that it's wrong +to send $m$ to an instance $a$ carrying an argument of type $d$. But of +course, we can treat an instance of $a$ as if it's an instance of $b$, +whereupon it appears that we are permitted to pass a~$c$ in our message. The +result is a well-known hole in the type system. Oops. + +\Cplusplus's return-type covariance is fine. Also fine is argument +\emph{contravariance}. If $b$ defined its message to have signature $c \to +@|int|$, and $a$ were to broaden its method to $d \to @|int|$, there'd be no +problem. All $c$s are $d$s, so viewing an $a$ as a $b$ does no harm. + +All of this fiddling with types is fine as long as method inheritance or +overriding is an all-or-nothing thing. But Sod has method combinations, +where applicable methods are taken from the instance's class and all its +superclasses and combined. And this makes everything very messy. + +It's possible to sort all of the mess out in the generated effective method +-- we'd just have to convert the arguments to the types that were expected by +the direct methods. This would require expensive run-time conversions of all +of the non-invariant arguments and return values. And we'd need some +complicated rule so that we could choose sensible types for the method +entries in our vtables. Something like this: +\begin{quote} \itshape + For each named argument of a message, there must be a unique greatest type + among the types given for that argument by the applicable methods; and + there must be a unique least type among all of the return types of the + applicable methods. +\end{quote} +I have visions of people wanting to write special no-effect methods whose +only purpose is to guide the translator around the class graph properly. +Let's not. + +%% things to talk about: +%% Liskov substitution principle and why it's mad + +%%%----- That's all, folks -------------------------------------------------- + +%%% Local variables: +%%% mode: LaTeX +%%% TeX-master: "sod.tex" +%%% TeX-PDF-mode: t +%%% End: diff --git a/doc/lispintro.tex b/doc/lispintro.tex new file mode 100644 index 0000000..5f1043f --- /dev/null +++ b/doc/lispintro.tex @@ -0,0 +1,207 @@ +%%% -*-latex-*- +%%% +%%% Description of the internal class structure and protocol +%%% +%%% (c) 2009 Straylight/Edgeware +%%% + +%%%----- Licensing notice --------------------------------------------------- +%%% +%%% This file is part of the Simple Object Definition system. +%%% +%%% SOD is free software; you can redistribute it and/or modify +%%% it under the terms of the GNU General Public License as published by +%%% the Free Software Foundation; either version 2 of the License, or +%%% (at your option) any later version. +%%% +%%% SOD is distributed in the hope that it will be useful, +%%% but WITHOUT ANY WARRANTY; without even the implied warranty of +%%% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +%%% GNU General Public License for more details. +%%% +%%% You should have received a copy of the GNU General Public License +%%% along with SOD; if not, write to the Free Software Foundation, +%%% Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + +\chapter{Protocol overview} \label{ch:proto} + +This chapter provides an overview of the Sod translator's internal object +model. It describes most of the important classes and generic functions, how +they are used to build a model of a Sod module and produce output code, and +how an extension might modify the translator's behaviour. + +I assume familiarity with the Common Lisp Object System (CLOS). Familiarity +with the CLOS Metaobject Protocol isn't necessary but may be instructive. + +%%%-------------------------------------------------------------------------- +\section{A tour through the translator} + +At the very highest level, the Sod translator works in two phases: it +\emph{parses} source files into an internal representation, and then it +\emph{generates} output files from the internal representation. + +The function @|read-module| is given a pathname for a file: it opens the +file, parses the program text, and returns a @|module| instance describing +the classes and other items found. Parsing has a number of extension points +which allow extensions to add their own module syntax. Properties can be +attached to modules and the items defined within them, which select which +internal classes are used to represent them, and possibly provide additional +parameters to them. + +Modules contain a variety of objects, but the most important objects are +classes, which are associated with a menagerie of other objects representing +the slots, messages, methods and so on defined in the module. These various +objects engage in a (fairly complicated) protocol to construct another +collection of \emph{layout objects} describing the low-level data structures +and tables which need to be creates. + +At the far end, the main output function is @|output-module|, which is given +a module, an output stream and a \emph{reason}, which describes what kind of +output is wanted. The module items and the layout objects then engage in +another protocol to work out what output needs to be produced, and which +order it needs to be written in. + +%%%-------------------------------------------------------------------------- +\section{Specification conventions} \label{sec:proto.conventions} + +Throughout this specification, the phrase `it is an error' indicates that a +particular circumstance is erroneous and results in unspecified and possibly +incorrect behaviour. In particular, the situation need not be immediately +diagnosed, and the consequences may be far-reaching. + +The following conventions apply throughout this specification. + +\begin{itemize} + +\item If a specification describes an argument as having a particular type or + syntax, then it is an error to provide an argument not having that + particular type or syntax. + +\item If a specification describes a function then that function might be + implemented as a generic function; it is an error to attempt to (re)define + it as a generic function, or to attempt to add methods to it. A function + specified as being a generic function will certainly be so; if user methods + are permitted on the generic function then this will be specified. + +\item Where a class precedence list is specified, either explicitly or + implicitly by a class hierarchy, the implementation may include additional + superclasses not specified here. Such additional superclasses will not + affect the order of specified classes in the class precedence lists either + of specified classes themselves or of user-defined subclasses of specified + classes. + +\item Unless otherwise specified, generic functions use the standard method + combination. + +\item The specifications for methods are frequently brief; they should be + read in conjunction with and in the context of the specification for the + generic function and specializing classes, if any. + +\item An object $o$ is a \emph{direct instance} of a class $c$ if @|(eq + (class-of $o$) $c$)|; $o$ is an \emph{instance} of $c$ if it is a direct + instance of any subclass of $c$. + +\item If a class is specified as being \emph{abstract} then it is an error to + construct direct instances of it, e.g., using @|make-instance|. + +\item If an object is specified as being \emph{immutable} then it is an error + to mutate it, e.g., using @|(setf (slot-value \ldots) \ldots)|. Programs + may rely on immutable objects retaining their state. + +\item A value is \emph{fresh} if it is guaranteed to be not @|eql| to any + previously existing value. A list is \emph{fresh} if it is guaranteed that + none of the cons cells in its main cdr chain (i.e., the list head, its cdr, + and so on) are @|eql| to any previously existing value. + +\item Unless otherwise specified, it is an error to mutate any part of value + passed as an argument to, or a non-fresh part of a value returned by, a + function specified in this document. + +\item Unless otherwise specified, it is an error to change the class of an + instance of any class described here; and it is an error to change the + class of an object to a class described here. + +\end{itemize} + +\subsection{Format of the entries} \label{sec:proto.conventions.format} + +Most symbols defined by the protocol have their own entries. An entry begins +with a header line, showing a synopsis of the symbol on the left, and the +category (function, class, macro, etc.) on the right. + +\begin{describe}{fun}{example-function @ + \&optional @ + \&rest @ + \&key :keyword + @> @} + The synopsis for a function, generic function or method describes the + function's lambda-list using the usual syntax. Note that keyword arguments + are shown by naming their keywords; in the description, the value passed + for the keyword argument @|:keyword| is shown as @. + + If no results are shown, then the return values (if any) are not + specified. Functions may return more than one result, e.g., + \begin{quote} \sffamily + floor @ \&optional (@ 1) @> @ @ + \end{quote} + or possibly multiple results, e.g., + \begin{quote} \sffamily + values \&rest @ @> @^* + \end{quote} + + For a method, specializers are shown using the usual @|defmethod| syntax, + e.g., + \begin{quote} \sffamily + some-generic-function ((@ list) @) + @> @ + \end{quote} +\end{describe} + +\begin{describe}{mac}{example-macro + ( @{ @ @! (@ @) @}^* ) \\ \ind + @[[ @^* @! @ @]] \\ + @^* + \nlret @^*} + The synopsis for a macro describes the acceptable syntax using the + following notation. + \begin{itemize} + \item Literal symbols, e.g., keywords and parenthesis, are shown in + @|sans|. + \item Metasyntactic variables are shown in (roman) @. + \item Items are grouped together by braces `@{ $\dots$ @}'. The notation + `@{ $\dots$ @}^*' indicates that the enclosed items may be repeated zero + or more times; `@{ $\dots$ @}^+' indicates that the enclosed items may be + repeated one or more times. This notation may be applied to a single + item without the braces. + \item Optional items are shown enclosed in brackets `@[ $\dots$ @]'. + \item Alternatives are separated by vertical bars `@!'; the vertical bar + has low precedence, so alternatives extend as far as possible between + bars and up to the enclosing brackets if any. + \item A sequence of alternatives enclosed in double-brackets `@[[ $\ldots$ + @]]' indicates that the alternatives may occur in any order, but each may + appear at most once unless marked by a star. + \item The notation for results is the same as for functions. + \end{itemize} + For example, the notation at the head of this example describes syntax + for @|let|. +\end{describe} + +\begin{describe}{cls}{example-class (direct-super other-direct-super) \&key + :initarg} + The synopsis for a class lists the class's direct superclasses, and the + acceptable initargs in the form of a lambda-list. The initargs may be + passed to @|make-instance| when constructing an instance of the class or a + subclass of it. If instances of the class may be reinitialized, or if + objects can be changed to be instances of the class, then these initargs + may also be passed to @|reinitialize-instance| and/or @|change-class| as + applicable; the class description will state explicitly when these + operations are allowed. +\end{describe} + +%%%----- That's all, folks -------------------------------------------------- + +%%% Local variables: +%%% mode: LaTeX +%%% TeX-master: "sod.tex" +%%% TeX-PDF-mode: t +%%% End: diff --git a/doc/output.tex b/doc/output.tex new file mode 100644 index 0000000..29a8b4d --- /dev/null +++ b/doc/output.tex @@ -0,0 +1,116 @@ +%%% -*-latex-*- +%%% +%%% Output machinery +%%% +%%% (c) 2015 Straylight/Edgeware +%%% + +%%%----- Licensing notice --------------------------------------------------- +%%% +%%% This file is part of the Sensble Object Design, an object system for C. +%%% +%%% SOD is free software; you can redistribute it and/or modify +%%% it under the terms of the GNU General Public License as published by +%%% the Free Software Foundation; either version 2 of the License, or +%%% (at your option) any later version. +%%% +%%% SOD is distributed in the hope that it will be useful, +%%% but WITHOUT ANY WARRANTY; without even the implied warranty of +%%% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +%%% GNU General Public License for more details. +%%% +%%% You should have received a copy of the GNU General Public License +%%% along with SOD; if not, write to the Free Software Foundation, +%%% Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + +\chapter{The output system} \label{ch:output} + +%%%-------------------------------------------------------------------------- + +%% output for `h' files +%% +%% prologue +%% guard start +%% typedefs start +%% typedefs +%% typedefs end +%% includes start +%% includes +%% includes end +%% classes start +%% CLASS banner +%% CLASS islots start +%% CLASS islots slots +%% CLASS islots end +%% CLASS vtmsgs start +%% CLASS vtmsgs CLASS start +%% CLASS vtmsgs CLASS slots +%% CLASS vtmsgs CLASS end +%% CLASS vtmsgs end +%% CLASS vtables start +%% CLASS vtables CHAIN-HEAD start +%% CLASS vtables CHAIN-HEAD slots +%% CLASS vtables CHAIN-HEAD end +%% CLASS vtables end +%% CLASS vtable-externs +%% CLASS vtable-externs-after +%% CLASS methods start +%% CLASS methods +%% CLASS methods end +%% CLASS ichains start +%% CLASS ichains CHAIN-HEAD start +%% CLASS ichains CHAIN-HEAD slots +%% CLASS ichains CHAIN-HEAD end +%% CLASS ichains end +%% CLASS ilayout start +%% CLASS ilayout slots +%% CLASS ilayout end +%% CLASS conversions +%% CLASS object +%% classes end +%% guard end +%% epilogue + +%% output for `c' files +%% +%% prologue +%% includes start +%% includes +%% includes end +%% classes start +%% CLASS banner +%% CLASS direct-methods start +%% CLASS direct-methods METHOD start +%% CLASS direct-methods METHOD body +%% CLASS direct-methods METHOD end +%% CLASS direct-methods end +%% CLASS effective-methods +%% CLASS vtables start +%% CLASS vtables CHAIN-HEAD start +%% CLASS vtables CHAIN-HEAD class-pointer METACLASS +%% CLASS vtables CHAIN-HEAD base-offset +%% CLASS vtables CHAIN-HEAD chain-offset TARGET-HEAD +%% CLASS vtables CHAIN-HEAD vtmsgs CLASS start +%% CLASS vtables CHAIN-HEAD vtmsgs CLASS slots +%% CLASS vtables CHAIN-HEAD vtmsgs CLASS end +%% CLASS vtables CHAIN-HEAD end +%% CLASS vtables end +%% CLASS object prepare +%% CLASS object start +%% CLASS object CHAIN-HEAD ichain start +%% CLASS object SUPER slots start +%% CLASS object SUPER slots +%% CLASS object SUPER vtable +%% CLASS object SUPER slots end +%% CLASS object CHAIN-HEAD ichain end +%% CLASS object end +%% classes end +%% epilogue + +%%%----- That's all, folks -------------------------------------------------- + +%%% Local variables: +%%% mode: LaTeX +%%% TeX-master: "sod.tex" +%%% TeX-PDF-mode: t +%%% End: diff --git a/doc/parsing.tex b/doc/parsing.tex new file mode 100644 index 0000000..1c4c3cd --- /dev/null +++ b/doc/parsing.tex @@ -0,0 +1,370 @@ +%%% -*-latex-*- +%%% +%%% Description of the parsing machinery +%%% +%%% (c) 2015 Straylight/Edgeware +%%% + +%%%----- Licensing notice --------------------------------------------------- +%%% +%%% This file is part of the Sensble Object Design, an object system for C. +%%% +%%% SOD is free software; you can redistribute it and/or modify +%%% it under the terms of the GNU General Public License as published by +%%% the Free Software Foundation; either version 2 of the License, or +%%% (at your option) any later version. +%%% +%%% SOD is distributed in the hope that it will be useful, +%%% but WITHOUT ANY WARRANTY; without even the implied warranty of +%%% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +%%% GNU General Public License for more details. +%%% +%%% You should have received a copy of the GNU General Public License +%%% along with SOD; if not, write to the Free Software Foundation, +%%% Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + +\chapter{Parsing} \label{ch:parsing} + +%%%-------------------------------------------------------------------------- +\section{The parser protocol} \label{sec:parsing.proto} + +For the purpose of Sod's parsing library, \emph{parsing} is the process of +reading a sequence of input items, in order, and computing an output value. + +A \emph{parser} is an expression which consumes zero or more input items and +returns three values: a \emph{result}, a \emph{success flag}, and a +\emph{consumed flag}. The two flags are (generalized) booleans. If the +success flag is non-nil, then the parser is said to have \emph{succeeded}, +and the result is the parser's output. If the success flag is nil then the +parser is said to have \emph{failed}, and the result is a list of +\emph{indicators}. Finally, the consumed flag is non-nil if the parser +consumed any input items. + +%%%-------------------------------------------------------------------------- +\section{File locations} + +%%%-------------------------------------------------------------------------- +\section{Scanners} \label{sec:parsing.scanner} + +A \emph{scanner} is an object which keeps track of a parser's progress as it +works through its input. There's no common base class for scanners: a +scanner is simply any object which implements the scanner protocol described +here. + +A scanner maintains a sequence of items to read. It can step forwards +through the items, one at a time, until it reaches the end (if, indeed, the +sequence is finite, which it needn't be). Until that point, there is a +current item, though there's no protocol for accessing it at this level +because the nature of the items is left unspecified. + +Some scanners support an additional \emph{place-capture} protocol which +allows rewinding the scanner to an earlier point in the input so that it can +be scanned again. + +\subsection{Basic scanner protocol} \label{sec:parsing.scanner.basic} + +The basic protocol supports stepping the scanner forward through its input +sequence, and detecting the end of the sequence. + +\begin{describe}{gf}{scanner-step @} + Advance the @ to the next item, which becomes current. + + It is an error to step the scanner if the scanner is at end-of-file. +\end{describe} + +\begin{describe}{gf}{scanner-at-eof-p @ @> @} + Return non-nil if the scanner is at end-of-file, i.e., there are no more + items to read. + + If nil is returned, there is a current item, and it is safe to step the + scanner again; otherwise, it is an error to query the current item or to + step the scanner. +\end{describe} + +\subsection{Place-capture scanner protocol} \label{sec:parsing.scanner.place} + +The place-capture protocol allows rewinding to an earlier point in the +sequence. Not all scanners support the place-capture protocol. + +To rewind a scanner to a particular point, that point must be \emph{captured} +as a \emph{place} when it's current -- so you must know in advance that this +is an interesting place that's worth capturing. The type of place returned +depends on the type of scanner. Given a captured place, the scanner can be +rewound to the position held in it. + +Depending on how the scanner works, holding onto a captured place might +consume a lot of memory or case poor performance. For example, if the +scanner is reading from an input stream, having a captured place means that +data from that point on must be buffered in case the program needs to rewind +the scanner and read that data again. Therefore it's possible to +\emph{release} a place when it turns out not to be needed any more. + +\begin{describe}{gf}{scanner-capture-place @ @> @} + Capture the @'s current position as a place, and return the place. +\end{describe} + +\begin{describe}{gf}{scanner-restore-place @ @} + Rewind the @ to the state it was in when @ was captured. + In particular, the item that was current when the @ was captured + becomes current again. + + It is an error to restore a @ that has been released, or if the + @ wasn't captured from the @. +\end{describe} + +\begin{describe}{gf}{scanner-release-place @ @} + Release the @, to avoid having to maintaining the ability to restore + it after it's not needed any more.. + + It is an error if the @ wasn't captured from the @. +\end{describe} + +\begin{describe}{mac} + {with-scanner-place (@ @) @^* @> @^*} + Capture the @'s current position as a place, evaluate the + @s as an implicit progn with the variable @ bound to the captured + place. When control leaves the @s, the place is released. The return + values are the values of the final @. +\end{describe} + +\subsection{Scanner file-location protocol} \label{sec:parsing.scanner.floc} + +Some scanners participate in the file-location protocol (\xref{sec:floc}). +They implement a method on @|file-location| which collects the necessary +information using scanner-specific functions described here. + +\begin{describe}{fun}{scanner-file-location @ @> @} + Return a @|file-location| object describing the current position of the + @. + + This calls the @|scanner-filename|, @|scanner-line| and @|scanner-column| + generic functions on the scanner, and uses these to fill in an appropriate + @|file-location|. + + Since there are default methods on these generic functions, it is not an + error to call @|scanner-file-location| on any kind of value, but it might + not be very useful. This function exists to do the work of appropriately + specialized methods on @|file-location|. +\end{describe} + +\begin{describe}{gf}{scanner-filename @ @> @} + Return the name of the file the scanner is currently processing, as a + string, or nil if the filename is not known. +\end{describe} + +\begin{describe}{meth}{scanner-filename (@ t) @> @} + Returns nil. +\end{describe} + +\begin{describe}{gf}{scanner-line @ @> @} + Return the line number of the @'s current position, as an integer, + or nil if the line number is not known. +\end{describe} + +\begin{describe}{meth}{scanner-line (@ t) @> @} + Returns nil. +\end{describe} + +\begin{describe}{gf}{scanner-column @ @> @} + Return the column number of the @'s current position, as an + integer, or nil if the column number is not known. +\end{describe} + +\begin{describe}{meth}{scanner-column (@ t) @> @} + Returns nil. +\end{describe} + +\subsection{Character scanners} \label{sec:parsing.scanner.char} + +Character scanners are scanners which read sequences of characters. + +\begin{describe}{cls}{character-scanner () \&key} + Base class for character scanners. This provides some very basic + functionality. + + Not all character scanners are subclasses of @|character-scanner|. +\end{describe} + +\begin{describe}{gf}{scanner-current-char @ @> @} + Returns the current character. +\end{describe} + +\begin{describe}{gf}{scanner-unread @ @} + Rewind the @ by one step. The @ must be the previous + current character, and becomes the current character again. It is an error + if: the @ has reached end-of-file; the @ is never been + stepped; or @ was not the previous current character. +\end{describe} + +\begin{describe}{gf} + {scanner-interval @ @ \&optional @ + @> @} + Return the characters in the @'s input from @ up to (but + not including) @. + + The characters are returned as a string. If @ is omitted, return + the characters up to (but not including) the current position. It is an + error if @ precedes @ or they are from different + scanners. + + This function is a character-scanner-specific extension to the + place-capture protocol; not all character scanners implement the + place-capture protocol, and some that do may not implement this function. +\end{describe} + +\subsubsection{Stream access to character scanners} +Sometimes it can be useful to apply the standard Lisp character input +operations to the sequence of characters held by a character scanner. + +\begin{describe}{gf}{make-scanner-stream @ @> @} + Returns a fresh input @|stream| object which fetches input characters from + the character scanner object @. Reading characters from the + stream steps the scanner. The stream will reach end-of-file when the + scanner reports end-of-file. If the scanner implements the file-location + protocol then reading from the stream will change the file location in an + appropriate manner. + + This is mostly useful for applying standard Lisp stream functions, most + particularly the @|read| function, in the middle of a parsing operation. +\end{describe} + +\begin{describe}{cls}{character-scanner-stream (stream) \&key :scanner} + A Common Lisp input @|stream| object which works using the character + scanner protocol. Any @ which implements the base scanner and + character scanner protocols is suitable. See @|make-scanner-stream|. +\end{describe} + +\subsection{String scanners} \label{sec:parsing.scanner.string} + +A \emph{string scanner} is a simple kind of character scanner which reads +input from a string object. String scanners implement the character scanner +and place-capture protocols. + +\begin{describe}{cls}{string-scanner} + The class of string scanners. The @|string-scanner| class is not a + subclass of @|character-scanner|. +\end{describe} + +\begin{describe}{fun}{string-scanner-p @ @> @} + Return non-nil if @ is a @|string-scanner| object; otherwise return + nil. +\end{describe} + +\begin{describe}{fun} + {make-string-scanner @ \&key :start :end @> @} + Construct and return a fresh @|string-scanner| object. The new scanner + will read characters from @, starting at index @ (which + defaults to zero), and continuing until it reaches index @ (defaults + to the end of the @). +\end{describe} + +\subsection{Character buffer scanners} \label{sec:parsing.scanner.charbuf} + +A \emph{character buffer scanner}, or \emph{charbuf scanner} for short, is an +efficient scanner for reading characters from an input stream. Charbuf +scanners implements the basic scanner, character buffer, place-capture, and +file-location protocols. + +\begin{describe}{cls} + {charbuf-scanner (character-scanner) + \&key :stream :filename :line :column} + The class of charbuf scanners. The scanner will read characters from + @. Charbuf scanners implement the file-location protocol: the + initial location is set from the given @, @ and @; + the scanner will update the location as it reads its input. +\end{describe} + +\begin{describe}{cls}{charbuf-scanner-place} + The class of place objects captured by a charbuf scanner. +\end{describe} + +\begin{describe}{fun} + {charbuf-scanner-place-p @ @> @} + Type predicate for charbuf scanner places: returns non-nil if @ is a + place captured by a charbuf scanner, and nil otherwise. +\end{describe} + +\begin{describe}{gf} + {charbuf-scanner-map @ @ \&optional @ + \nlret @ @ @} + Read characters from the @'s buffers. + + This is intended to be an efficient and versatile interface for reading + characters from a scanner in bulk. The function @ is invoked + repeatedly, as if by + \begin{prog} + (multiple-value-bind (@ @) \\ \ind\ind + (funcall @ @ @ @) \- \\ + \textrm\ldots) + \end{prog} + The argument @ is a simple string; @ and @ are two + nonnegative fixnums, indicating that the subsequence of @ between + @ (inclusive) and @ (exclusive) should be processed. If + @'s return value @ is nil then @ is ignored: the + function has consumed the entire buffer and wishes to read more. If + @ is non-nil, then it must be a fixnum such that $@ \le + @ \le @$: the function has consumed the buffer as far as @ + (exclusive) and has completed successfully. + + If end-of-file is encountered before @ completes successfully then it + fails: the @ function is called with no arguments, and is expected to + return two values. If omitted, @ defaults to + \begin{prog} + (lambda () \\ \ind + (values nil nil))% + \end{prog} + + The @|charbuf-scanner-map| function returns three values. The first value + is the non-nil @ value returned by @ if @|charbuf-scanner-map| + succeeded, or the first value returned by @; the second value is @|t| + on success, or the second value returned by @; the third value is + non-nil if @ consumed any input, i.e., it returned with @ nil + at least once, or with $@ > @$. +\end{describe} + +\subsection{Token scanners} \label{sec:parsing.scanner.token} + +\begin{describe}{cls} + {token-scanner () \&key :filename (:line 1) (:column 0)} +\end{describe} + +\begin{describe}{gf}{token-type @ @> @} +\end{describe} + +\begin{describe}{gf}{token-value @ @> @} +\end{describe} + +\begin{describe}{gf}{scanner-token @ @> @ @} +\end{describe} + +\begin{describe}{ty}{token-scanner-place} +\end{describe} + +\begin{describe}{fun} + {token-scanner-place-p @ @> @} +\end{describe} + +\subsection{List scanners} + +\begin{describe}{ty}{list-scanner} +\end{describe} + +\begin{describe}{fun}{list-scanner-p @ @> @} +\end{describe} + +\begin{describe}{fun}{make-list-scanner @ @> @} +\end{describe} + +%%%-------------------------------------------------------------------------- +\section{Parsing macros} + +%%%-------------------------------------------------------------------------- +\section{Lexical analyser} + +%%%----- That's all, folks -------------------------------------------------- + +%%% Local variables: +%%% mode: LaTeX +%%% TeX-master: "sod.tex" +%%% TeX-PDF-mode: t +%%% End: diff --git a/doc/sod-backg.tex b/doc/sod-backg.tex deleted file mode 100644 index 0cb3877..0000000 --- a/doc/sod-backg.tex +++ /dev/null @@ -1,156 +0,0 @@ -%%% -*-latex-*- -%%% -%%% Background philosophy -%%% -%%% (c) 2009 Straylight/Edgeware -%%% - -%%%----- Licensing notice --------------------------------------------------- -%%% -%%% This file is part of the Simple Object Definition system. -%%% -%%% SOD is free software; you can redistribute it and/or modify -%%% it under the terms of the GNU General Public License as published by -%%% the Free Software Foundation; either version 2 of the License, or -%%% (at your option) any later version. -%%% -%%% SOD is distributed in the hope that it will be useful, -%%% but WITHOUT ANY WARRANTY; without even the implied warranty of -%%% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -%%% GNU General Public License for more details. -%%% -%%% You should have received a copy of the GNU General Public License -%%% along with SOD; if not, write to the Free Software Foundation, -%%% Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. - -\chapter{Philosophical background} - -%%%-------------------------------------------------------------------------- -\section{Superclass linearization} - -Before making any decisions about relationships between superclasses, Sod -\emph{linearizes} them, i.e., imposes a total order consistent with the -direct-subclass/superclass partial order. - -In the vague hope that we don't be completely bogged down in formalism by the -end of this, let's introduce some notation. We'll fix some class $z$ and -consider its set of superclasses $S(z) = \{ a, b, \dots \}$. We can define a -relation $c \prec_1 d$ if $c$ is a direct subclass of $d$, and extend it by -taking the reflexive, transitive closure: $c \preceq d$ if and only if -\begin{itemize} -\item $c = d$, or -\item there exists some class $x$ such that $c \prec_1 x$ and $x \preceq d$. -\end{itemize} -This is the `is-subclass-of' relation we've been using so far.\footnote{% - In some object systems, notably Flavors, this relation is allowed to fail - to be a partial order because of cycles in the class graph. I haven't - given a great deal of thought to how well Sod would cope with a cyclic - class graph.} % -We write $d \succeq c$ and say that $d$ is a superclass of $c$ if and only if -$c \preceq d$. - -The problem comes when we try to resolve inheritance questions. A class -should inherit behaviour from its superclasses; but, in a world of multiple -inheritance, which one do we choose? We get a simple version of this problem -when we try to resolve inheritance of slot initializers: only one initializer -can be inherited. - -We start by collecting into a set~$I$ the classes which define an initializer -for the slot. If $I$ contains both a class $x$ and one of $x$'s superclasses -then we should prefer $x$ and consider the superclass to be overridden. So -we should confine our attention to \emph{least} classes: a member $x$ of a -set $I$ is least, with respect to a particular partial order, if $y \preceq -x$ only when $x = y$. If there is a single least class in our set the we -have a winner. Otherwise we want some way to choose among them. - -This is not uncontroversial. Languages such as \Cplusplus\ refuse to choose -among least classes; instead, any program in which such a choice must be made -is simply declared erroneous. - -Simply throwing up our hands in horror at this situation is satisfactory when -we only wanted to pick one `winner', as we do for slot initializers. -However, method combination is a much more complicated business. We don't -want to pick just one winner: we want to order all of the applicable methods -in some way. Insisting that there is a clear winner at every step along the -chain is too much of an imposition. Instead, we \emph{linearize} the -classes. - -%%%-------------------------------------------------------------------------- -\section{Invariance, covariance, contravariance} - -In Sod, at least with regard to the existing method combinations, method -types are \emph{invariant}. This is not an accident, and it's not due to -ignorance. - -The \emph{signature} of a function, method or message describes its argument -and return-value types. If a method's arguments are an integer and a string, -and it returns a character, we might write its signature as -\[ (@|int|, @|string|) \to @|char| \] -In Sod, a method's arguments have to match its message's arguments precisely, -and the return type must either be @|void| -- for a dæmon method -- or again -match the message's return type. This is argument and return-type -\emph{invariance}. - -Some object systems allow methods with subtly different signatures to be -defined on a single message. In particular, since the idea is that instances -of a subclass ought to be broadly compatible~(see \xref{sec:phil.lsp}) with -existing code which expects instances of a superclass, we might be able to -get away with bending method signatures one way or another to permit this. - -\Cplusplus\ permits \emph{return-type covariance}, where a method's return -type can be a subclass of the return type specified by a less-specific -method. Eiffel allows \emph{argument covariance}, where a method's arguments -can be subclasses of the arguments specified by a less-specific -method.\footnote{% - Attentive readers will note that I ought to be talking about pointers to - instances throughout. I'm trying to limit the weight of the notation. - Besides, I prefer data models as found in Lisp and Python where all values - are held by reference.} % - -Eiffel's argument covariance is unsafe.\footnote{% - Argument covariance is correct if you're doing runtime dispatch based on - argument types. Eiffel isn't: it's single dispatch, like Sod is.} % -Suppose that we have two pairs of classes, $a \prec_1 b$ and $c \prec_1 d$. -Class $b$ defines a message $m$ with signature $d \to @|int|$; class $a$ -defines a method with signature $c \to @|int|$. This means that it's wrong -to send $m$ to an instance $a$ carrying an argument of type $d$. But of -course, we can treat an instance of $a$ as if it's an instance of $b$, -whereupon it appears that we are permitted to pass a~$c$ in our message. The -result is a well-known hole in the type system. Oops. - -\Cplusplus's return-type covariance is fine. Also fine is argument -\emph{contravariance}. If $b$ defined its message to have signature $c \to -@|int|$, and $a$ were to broaden its method to $d \to @|int|$, there'd be no -problem. All $c$s are $d$s, so viewing an $a$ as a $b$ does no harm. - -All of this fiddling with types is fine as long as method inheritance or -overriding is an all-or-nothing thing. But Sod has method combinations, -where applicable methods are taken from the instance's class and all its -superclasses and combined. And this makes everything very messy. - -It's possible to sort all of the mess out in the generated effective method --- we'd just have to convert the arguments to the types that were expected by -the direct methods. This would require expensive run-time conversions of all -of the non-invariant arguments and return values. And we'd need some -complicated rule so that we could choose sensible types for the method -entries in our vtables. Something like this: -\begin{quote} \itshape - For each named argument of a message, there must be a unique greatest type - among the types given for that argument by the applicable methods; and - there must be a unique least type among all of the return types of the - applicable methods. -\end{quote} -I have visions of people wanting to write special no-effect methods whose -only purpose is to guide the translator around the class graph properly. -Let's not. - -%% things to talk about: -%% Liskov substitution principle and why it's mad - -%%%----- That's all, folks -------------------------------------------------- - -%%% Local variables: -%%% mode: LaTeX -%%% TeX-master: "sod.tex" -%%% TeX-PDF-mode: t -%%% End: diff --git a/doc/sod.sty b/doc/sod.sty new file mode 100644 index 0000000..105a6c7 --- /dev/null +++ b/doc/sod.sty @@ -0,0 +1,168 @@ +%%% -*-latex-*- +%%% +%%% Styles and other hacking for the Sod manual +%%% +%%% (c) 2015 Straylight/Edgeware +%%% + +%%%----- Licensing notice --------------------------------------------------- +%%% +%%% This file is part of the Sensble Object Design, an object system for C. +%%% +%%% SOD is free software; you can redistribute it and/or modify +%%% it under the terms of the GNU General Public License as published by +%%% the Free Software Foundation; either version 2 of the License, or +%%% (at your option) any later version. +%%% +%%% SOD is distributed in the hope that it will be useful, +%%% but WITHOUT ANY WARRANTY; without even the implied warranty of +%%% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +%%% GNU General Public License for more details. +%%% +%%% You should have received a copy of the GNU General Public License +%%% along with SOD; if not, write to the Free Software Foundation, +%%% Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + +\ProvidesPackage{sod} + +%% More reference types. +\defxref{p}{part} + +%% Other languages with special typesetting. +\def\Cplusplus{C\kern-\p@++} +\def\Csharp{C\#} + +%% Special maths notation. +\def\chain#1#2{\mathsf{ch}_{#1}(#2)} +\def\chainhead#1#2{\mathsf{hd}_{#1}(#2)} +\def\chaintail#1#2{\mathsf{tl}_{#1}(#2)} + +%% Other mathematical tweaks. +\let\implies\Rightarrow +\let\epsilon\varepsilon + +%% Unix manpage references. +\def\man#1#2{\textbf{#1}(#2)} + +%% Listings don't need to be small. +\let\listingsize\relax + +%% Metavariables are italics without decoration. +\def\syntleft{\normalfont\itshape} +\let\syntright\empty + +%% Literal code is in sans face. +\let\codeface\sffamily +\def\code#1{\ifmmode\hbox\fi{\normalfont\codeface\/#1\/}} +\def\ulitleft{\normalfont\codeface} +\let\ulitright\empty + +%% Conditionally enter maths mode. Can't use \ensuremath here because we +%% aren't necessarily sure where the maths will actually end. +\let\m@maybe@end\relax +\def\m@maybe{\ifmmode\else$\let\m@maybe@end$\fi} + +%% Standard syntax shortcuts. +\atdef <#1>{\synt{#1}\@scripts} +\atdef "#1"{\lit*{#1}\@scripts} +\atdef `#1'{\lit{#1}\@scripts} +\atdef |#1|{\textsf{#1}\@scripts} + +%% A handy abbreviation; `\\' itself is too good to steal. +\atdef \\{\textbackslash} + +%% Intercept grammar typesetting and replace the vertical bar with the +%% maths-font version. +\let\@@grammar\grammar +\def\grammar{\def\textbar{\hbox{$|$}}\@@grammar} + +%% Collect super- and subscripts. (Note that underscores are active for the +%% most part.) When we're done, end maths mode if we entered it +%% conditionally. +\def\@scripts{\futurelet\@ch\@scripts@i} +\begingroup\lccode`\~=`\_\lowercase{\endgroup +\def\@scripts@i{\if1\ifx\@ch~1\else\ifx\@ch^1\else0\fi\fi% + \expandafter\@scripts@ii\else\expandafter\m@maybe@end\fi}} +\def\@scripts@ii#1#2{\m@maybe#1{#2}\@scripts} + +%% Doubling characters, maybe. Either way, chain onto \@scripts. +\def\dbl@maybe#1{\let\@tempa#1\futurelet\@ch\dbl@maybe@i} +\def\dbl@maybe@i{\m@maybe\ifx\@ch\@tempa\@tempa\!\@tempa% + \expandafter\@firstoftwo\expandafter\@scripts% + \else\@tempa\expandafter\@scripts\fi} + +%% Extra syntax for Lisp templates. These produce the maths-font versions of +%% characters, which should contrast well against the sans face used for +%% literals. +\atdef [{\dbl@maybe[} +\atdef ]{\dbl@maybe]} +\atdef {{\m@maybe\{\@scripts} +\atdef }{\m@maybe\}\@scripts} +\atdef ({\m@maybe(\@scripts} +\atdef ){\m@maybe)\@scripts} +\atdef !{\m@maybe|\@scripts} +\def\returns{\m@maybe\longrightarrow\m@maybe@end\hspace{0.5em}\ignorespaces} +\atdef >{\leavevmode\unskip\hspace{0.5em}\returns} + +%% Comment setting. +\atdef ;#1\\{\normalfont\itshape;#1\\} + +%% Environment for setting programs. Newlines are explicit, because +%% otherwise I need comments in weird places to make the vertical spacing +%% come out properly. You can write `\obeylines' if you really want to. +\def\prog{\codeface\quote\tabbing} +\def\endprog{\endtabbing\endquote} +\def\ind{\quad\=\+\kill} + +%% Put a chunk of text in a box. +\newenvironment{boxy}[1][\q@]{% + \dimen@\linewidth\advance\dimen@-1.2pt\advance\dimen@-2ex% + \medskip% + \vbox\bgroup\hrule\hbox\bgroup\vrule% + \vbox\bgroup\vskip1ex\hbox\bgroup\hskip1ex\minipage\dimen@% + \def\@temp{#1}\ifx\@temp\q@\else\leavevmode{\headfam\bfseries#1\quad}\fi% +}{% + \endminipage\hskip1ex\egroup\vskip1ex\egroup% + \vrule\egroup\hrule\egroup% + \medskip% +} + +%% Lisp documentation machinery. +\def\definedescribecategory#1#2{\@namedef{cat!#1}{#2}} +\def\describecategoryname#1{% + \expandafter\let\expandafter\@tempa\csname cat!#1\endcsname% + \ifx\@tempa\relax#1\else\@tempa\fi} +\definedescribecategory{fun}{function} +\definedescribecategory{gf}{generic function} +\definedescribecategory{var}{variable} +\definedescribecategory{const}{constant} +\definedescribecategory{meth}{primary method} +\definedescribecategory{ar-meth}{around-method} +\definedescribecategory{be-meth}{before-method} +\definedescribecategory{af-meth}{after-method} +\definedescribecategory{cls}{class} +\definedescribecategory{ty}{type} +\definedescribecategory{mac}{macro} +\def\nlret{\\\hspace{4em}\returns} + +\def\q@{\q@} +\newenvironment{describe}[3][\q@]{% + \normalfont% + \par\goodbreak% + \vspace{\bigskipamount}% + \setbox\z@\hbox{\bfseries[\describecategoryname{#2}]}% + \dimen@\linewidth\advance\dimen@-\wd\z@% + \def\@temp##1 ##2\q@{\message{#2:##1}\label{#2:##1}}% + \def\@tempa{#1}\ifx\@tempa\q@\@temp#3 \q@\else\@temp{#1} \q@\fi% + \edef\@temp{{\the\linewidth}{@{}p{\the\dimen@}% + @{\extracolsep{\fill}}l@{\extracolsep{0pt}}}}% + \noindent\csname tabular*\expandafter\endcsname\@temp% + \tabbing\codeface#3\endtabbing&\unhbox\z@\\\endtabular% +% \@afterheading% + \list{}{\rightmargin\z@}\item% +}{% + \endlist% +} + +%%% ----- That's all, folks -------------------------------------------------- +\endinput \ No newline at end of file diff --git a/doc/sod.tex b/doc/sod.tex index ff0dea4..4979b23 100644 --- a/doc/sod.tex +++ b/doc/sod.tex @@ -1,5 +1,32 @@ +%%% -*-latex-*- +%%% +%%% Description of the internal class structure and protocol +%%% +%%% (c) 2009 Straylight/Edgeware +%%% + +%%%----- Licensing notice --------------------------------------------------- +%%% +%%% This file is part of the Simple Object Definition system. +%%% +%%% SOD is free software; you can redistribute it and/or modify +%%% it under the terms of the GNU General Public License as published by +%%% the Free Software Foundation; either version 2 of the License, or +%%% (at your option) any later version. +%%% +%%% SOD is distributed in the hope that it will be useful, +%%% but WITHOUT ANY WARRANTY; without even the implied warranty of +%%% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +%%% GNU General Public License for more details. +%%% +%%% You should have received a copy of the GNU General Public License +%%% along with SOD; if not, write to the Free Software Foundation, +%%% Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + \documentclass[noarticle]{strayman} +\errorcontextlines=999 + \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} \usepackage[palatino, helvetica, courier, maths=cmr]{mdwfonts} @@ -13,1195 +40,118 @@ \usepackage{at} \usepackage{mdwref} +\usepackage{sod} + \title{A Sensible Object Design for C} \author{Mark Wooding} -\makeatletter - -\errorcontextlines999 - -\def\syntleft{\normalfont\itshape} -\let\syntright\empty - -\let\codeface\sffamily - -\def\ulitleft{\normalfont\codeface} -\let\ulitright\empty - -\let\listingsize\relax - -\let\epsilon\varepsilon - -\atdef <#1>{\synt{#1}\@scripts} -\atdef "#1"{\lit*{#1}\@scripts} -\atdef `#1'{\lit{#1}\@scripts} -\atdef |#1|{\textsf{#1}\@scripts} -\def\dbl@maybe#1{\let\@tempa#1\futurelet\@ch\dbl@maybe@i} -\def\dbl@maybe@i{\m@maybe\ifx\@ch\@tempa\@tempa\!\@tempa% - \expandafter\@firstoftwo\expandafter\@scripts% - \else\@tempa\expandafter\@scripts\fi} -\atdef [{\dbl@maybe[} -\atdef ]{\dbl@maybe]} -\atdef {{\m@maybe\{\@scripts} -\atdef }{\m@maybe\}\@scripts} -\atdef ({\m@maybe(\@scripts} -\atdef ){\m@maybe)\@scripts} -\atdef !{\m@maybe|\@scripts} -\atdef to{\leavevmode\unskip\quad\m@maybe\longrightarrow\m@maybe@end\quad} -\let\m@maybe@end\relax -\def\m@maybe{\ifmmode\else$\let\m@maybe@end$\fi} -\def\@scripts{\futurelet\@ch\@scripts@i} - -\def\chain#1#2{\mathsf{ch}_{#1}(#2)} -\def\chainhead#1#2{\mathsf{hd}_{#1}(#2)} -\def\chaintail#1#2{\mathsf{tl}_{#1}(#2)} - -\let\implies\Rightarrow - -\atdef ;#1\\{\normalfont\itshape;#1\\} -\let\@@grammar\grammar -\def\grammar{\def\textbar{\hbox{$|$}}\@@grammar} - -\begingroup\lccode`\~=`\_\lowercase{\endgroup -\def\@scripts@i{\if1\ifx\@ch~1\else\ifx\@ch^1\else0\fi\fi% - \expandafter\@scripts@ii\else\expandafter\m@maybe@end\fi}} -\def\@scripts@ii#1#2{\m@maybe#1{#2}\@scripts} - -\def\Cplusplus{C\kern-\p@++} -\def\Csharp{C\#} -\def\man#1#2{\textbf{#1}(#2)} - -\begingroup\lccode`\~=`\ -\lowercase{ -\endgroup -\def\prog{% - \codeface% - \quote% - \let\old@nl\\% - \obeylines% - \tabbing% - \global\let~\\% - \global\let\\\textbackslash% -} -\def\endprog{% - \endtabbing% - \global\let\\\old@nl% - \endquote% -}} - -\newenvironment{boxy}[1][\q@]{% - \dimen@\linewidth\advance\dimen@-1.2pt\advance\dimen@-2ex% - \medskip% - \vbox\bgroup\hrule\hbox\bgroup\vrule% - \vbox\bgroup\vskip1ex\hbox\bgroup\hskip1ex\minipage\dimen@% - \def\@temp{#1}\ifx\@temp\q@\else\leavevmode{\headfam\bfseries#1\quad}\fi% -}{% - \endminipage\hskip1ex\egroup\vskip1ex\egroup% - \vrule\egroup\hrule\egroup% - \medskip% -} - -\def\definedescribecategory#1#2{\@namedef{cat!#1}{#2}} -\def\describecategoryname#1{% - \expandafter\let\expandafter\@tempa\csname cat!#1\endcsname% - \ifx\@tempa\relax#1\else\@tempa\fi} -\definedescribecategory{fun}{function} -\definedescribecategory{gf}{generic function} -\definedescribecategory{var}{variable} -\definedescribecategory{const}{constant} -\definedescribecategory{meth}{primary method} -\definedescribecategory{ar-meth}{around-method} -\definedescribecategory{be-meth}{before-method} -\definedescribecategory{af-meth}{after-method} -\definedescribecategory{cls}{class} -\definedescribecategory{ty}{type} -\definedescribecategory{mac}{macro} - -\def\q@{\q@} -\newenvironment{describe}[3][\q@]{% - \normalfont% - \par\goodbreak% - \vspace{\bigskipamount}% - \setbox\z@\hbox{\bfseries[\describecategoryname{#2}]}% - \dimen@\linewidth\advance\dimen@-\wd\z@% - \def\@temp##1 ##2\q@{\message{#2:##1}\label{#2:##1}}% - \def\@tempa{#1}\ifx\@tempa\q@\@temp#3 \q@\else\@temp{#1} \\\fi% - \edef\@temp{{\the\linewidth}{@{}p{\the\dimen@}% - @{\extracolsep{\fill}}l@{\extracolsep{0pt}}}}% - \noindent\csname tabular*\expandafter\endcsname\@temp% - \tabbing\codeface#3\endtabbing&\unhbox\z@\\\endtabular% -% \@afterheading% - \list{}{\rightmargin\z@}\item% -}{% - \endlist% -} - -\def\push{\quad\=\+\kill} - \begin{document} \maketitle -\include{sod-tut} - %%%-------------------------------------------------------------------------- -\chapter{Internals} - -\section{Generated names} - -The generated names for functions and objects related to a class are -constructed systematically so as not to interfere with each other. The rules -on class, slot and message naming exist so as to ensure that the generated -names don't collide with each other. - -The following notation is used in this section. -\begin{description} -\item[@] The full name of the `focus' class: the one for which we are - generating name. -\item[@] The nickname of a superclass. -\item[@] The nickname of the chain-head class of the chain - in question. -\end{description} - -\subsection{Instance layout} - -%%%-------------------------------------------------------------------------- -\section{Syntax} -\label{sec:syntax} - -Fortunately, Sod is syntactically quite simple. I've used a little slightly -unusual notation in order to make the presentation easier to read. For any -nonterminal $x$: -\begin{itemize} -\item $\epsilon$ denotes the empty nonterminal: - \begin{quote} - $\epsilon$ ::= - \end{quote} -\item @[$x$@] means an optional $x$: - \begin{quote} - \syntax{@[$x$@] ::= $\epsilon$ @! $x$} - \end{quote} -\item $x^*$ means a sequence of zero or more $x$s: - \begin{quote} - \syntax{$x^*$ ::= $\epsilon$ @! $x^*$ $x$} - \end{quote} -\item $x^+$ means a sequence of one or more $x$s: - \begin{quote} - \syntax{$x^+$ ::= $x$ $x^*$} - \end{quote} -\item $x$@<-list> means a sequence of one or more $x$s separated - by commas: - \begin{quote} - \syntax{$x$<-list> ::= $x$ @! $x$<-list> "," $x$} - \end{quote} -\end{itemize} - -\subsection{Lexical syntax} -\label{sec:syntax.lex} - -Whitespace and comments are discarded. The remaining characters are -collected into tokens according to the following syntax. - -\begin{grammar} - ::= -\alt -\alt -\alt -\alt -\end{grammar} - -This syntax is slightly ambiguous, and is disambiguated by the \emph{maximal -munch} rule: at each stage we take the longest sequence of characters which -could be a token. - -\subsubsection{Identifiers} \label{sec:syntax.lex.id} - -\begin{grammar} - ::= @^* - - ::= | "_" - - ::= @! - - ::= "A" | "B" | \dots\ | "Z" -\alt "a" | "b" | \dots\ | "z" -\alt - - ::= "0" | - - ::= "1" | "2" $| \cdots |$ "9" -\end{grammar} - -The precise definition of @ is left to the function -\textsf{alpha-char-p} in the hosting Lisp system. For portability, -programmers are encouraged to limit themselves to the standard ASCII letters. - -There are no reserved words at the lexical level, but the higher-level syntax -recognizes certain identifiers as \emph{keywords} in some contexts. There is -also an ambiguity (inherited from C) in the declaration syntax which is -settled by distinguishing type names from other identifiers at a lexical -level. - -\subsubsection{String and character literals} \label{sec:syntax.lex.string} - -\begin{grammar} - ::= "\"" @^* "\"" - - ::= "'" "'" - - ::= any character other than "\\" or "\"" -\alt "\\" - - ::= any character other than "\\" or "'" -\alt "\\" - - ::= any single character -\end{grammar} - -The syntax for string and character literals differs from~C. In particular, -escape sequences such as @`\textbackslash n' are not recognized. The use -of string and character literals in Sod, outside of C~fragments, is limited, -and the simple syntax seems adequate. For the sake of future compatibility, -the use of character sequences which resemble C escape sequences is -discouraged. - -\subsubsection{Integer literals} \label{sec:syntax.lex.int} - -\begin{grammar} - ::= -\alt -\alt -\alt - - ::= @^* - - ::= "0" @("b"|"B"@) @^+ - - ::= "0" | "1" - - ::= "0" @["o"|"O"@] @^+ - - ::= "0" | "1" $| \cdots |$ "7" - - ::= "0" @("x"|"X"@) @^+ - - ::= -\alt "A" | "B" | "C" | "D" | "E" | "F" -\alt "a" | "b" | "c" | "d" | "e" | "f" -\end{grammar} - -Sod understands only integers, not floating-point numbers; its integer syntax -goes slightly beyond C in allowing a @`0o' prefix for octal and @`0b' for -binary. However, length and signedness indicators are not permitted. - -\subsubsection{Punctuation} \label{sec:syntax.lex.punct} - -\begin{grammar} - ::= any nonalphanumeric character other than "_", "\"" or "'" -\end{grammar} - -\subsubsection{Comments} \label{sec:lex-comment} - -\begin{grammar} - ::= -\alt - - ::= - "/*" - @^* @(@^+ @^*@)^* - @^* - "*/" - - ::= "*" - - ::= any character other than "*" - - ::= any character other than "*" or "/" - - ::= "//" @^* - - ::= a newline character - - ::= any character other than newline -\end{grammar} - -Comments are exactly as in C99: both traditional block comments `\texttt{/*} -\dots\ \texttt{*/}' and \Cplusplus-style `\texttt{//} \dots' comments are -permitted and ignored. - -\subsection{Special nonterminals} -\label{sec:special-nonterminals} - -Aside from the lexical syntax presented above (\xref{sec:lexical-syntax}), -two special nonterminals occur in the module syntax. - -\subsubsection{S-expressions} \label{sec:syntax-sexp} - -\begin{grammar} - ::= an S-expression, as parsed by the Lisp reader -\end{grammar} - -When an S-expression is expected, the Sod parser simply calls the host Lisp -system's \textsf{read} function. Sod modules are permitted to modify the -read table to extend the S-expression syntax. - -S-expressions are self-delimiting, so no end-marker is needed. - -\subsubsection{C fragments} \label{sec:syntax.lex.cfrag} - -\begin{grammar} - ::= a sequence of C tokens, with matching brackets -\end{grammar} - -Sequences of C code are simply stored and written to the output unchanged -during translation. They are read using a simple scanner which nonetheless -understands C comments and string and character literals. - -A C fragment is terminated by one of a small number of delimiter characters -determined by the immediately surrounding context -- usually a closing brace -or bracket. The first such delimiter character which is not enclosed in -brackets, braces or parenthesis ends the fragment. - -\subsection{Module syntax} \label{sec:syntax-module} - -\begin{grammar} - ::= @^* - - ::= -\alt -\alt -\alt -\alt -\alt -\end{grammar} - -A module is the top-level syntactic item. A module consists of a sequence of -definitions. - -\subsection{Simple definitions} \label{sec:syntax.defs} - -\subsubsection{Importing modules} \label{sec:syntax.defs.import} - -\begin{grammar} - ::= "import" ";" -\end{grammar} - -The module named @ is processed and its definitions made available. - -A search is made for a module source file as follows. -\begin{itemize} -\item The module name @ is converted into a filename by appending - @`.sod', if it has no extension already.\footnote{% - Technically, what happens is \textsf{(merge-pathnames name (make-pathname - :type "SOD" :case :common))}, so exactly what this means varies - according to the host system.} % -\item The file is looked for relative to the directory containing the - importing module. -\item If that fails, then the file is looked for in each directory on the - module search path in turn. -\item If the file still isn't found, an error is reported and the import - fails. -\end{itemize} -At this point, if the file has previously been imported, nothing further -happens.\footnote{% - This check is done using \textsf{truename}, so it should see through simple - tricks like symbolic links. However, it may be confused by fancy things - like bind mounts and so on.} % - -Recursive imports, either direct or indirect, are an error. - -\subsubsection{Loading extensions} \label{sec:syntax.defs.load} - -\begin{grammar} - ::= "load" ";" -\end{grammar} - -The Lisp file named @ is loaded and evaluated. - -A search is made for a Lisp source file as follows. -\begin{itemize} -\item The name @ is converted into a filename by appending @`.lisp', - if it has no extension already.\footnote{% - Technically, what happens is \textsf{(merge-pathnames name (make-pathname - :type "LISP" :case :common))}, so exactly what this means varies - according to the host system.} % -\item A search is then made in the same manner as for module imports - (\xref{sec:syntax-module}). -\end{itemize} -If the file is found, it is loaded using the host Lisp's \textsf{load} -function. - -Note that Sod doesn't attempt to compile Lisp files, or even to look for -existing compiled files. The right way to package a substantial extension to -the Sod translator is to provide the extension as a standard ASDF system (or -similar) and leave a dropping @"foo-extension.lisp" in the module path saying -something like -\begin{quote} - \textsf{(asdf:load-system :foo-extension)} -\end{quote} -which will arrange for the extension to be compiled if necessary. - -(This approach means that the language doesn't need to depend on any -particular system definition facility. It's bad enough already that it -depends on Common Lisp.) - -\subsubsection{Lisp escapes} \label{sec:syntax.defs.lisp} - -\begin{grammar} - ::= "lisp" ";" -\end{grammar} - -The @ is evaluated immediately. It can do anything it likes. - -\textbf{Warning!} This means that hostile Sod modules are a security hazard. -Lisp code can read and write files, start other programs, and make network -connections. Don't install Sod modules from sources that you don't -trust.\footnote{% - Presumably you were going to run the corresponding code at some point, so - this isn't as unusually scary as it sounds. But please be careful.} % - -\subsubsection{Declaring type names} \label{sec:syntax.defs.typename} - -\begin{grammar} - ::= - "typename" ";" -\end{grammar} - -Each @ is declared as naming a C type. This is important because -the C type syntax -- which Sod uses -- is ambiguous, and disambiguation is -done by distinguishing type names from other identifiers. - -Don't declare class names using @"typename"; use @"class" forward -declarations instead. - -\subsection{Literal code} \label{sec:syntax-code} - -\begin{grammar} - ::= - "code" ":" @[@] - "{" "}" - - ::= "[" "]" - - ::= @^+ -\end{grammar} - -The @ will be output unchanged to one of the output files. - -The first @ is the symbolic name of an output file. Predefined -output file names are @"c" and @"h", which are the implementation code and -header file respectively; other output files can be defined by extensions. - -The second @ provides a name for the output item. Several C -fragments can have the same name: they will be concatenated together in the -order in which they were encountered. - -The @ provide a means for specifying where in the output file -the output item should appear. (Note the two kinds of square brackets shown -in the syntax: square brackets must appear around the constraints if they are -present, but that they may be omitted.) Each comma-separated @ -is a sequence of identifiers naming output items, and indicates that the -output items must appear in the order given -- though the translator is free -to insert additional items in between them. (The particular output items -needn't be defined already -- indeed, they needn't be defined ever.) +\frontmatter -There is a predefined output item @"includes" in both the @"c" and @"h" -output files which is a suitable place for inserting @"\#include" -preprocessor directives in order to declare types and functions for use -elsewhere in the generated output files. +\tableofcontents -\subsection{Property sets} \label{sec:syntax.propset} - -\begin{grammar} - ::= "[" "]" - - ::= "=" -\end{grammar} - -Property sets are a means for associating miscellaneous information with -classes and related items. By using property sets, additional information -can be passed to extensions without the need to introduce idiosyncratic -syntax. - -A property has a name, given as an @, and a value computed by -evaluating an @. The value can be one of a number of types, -though the only operators currently defined act on integer values only. - -\subsubsection{The expression evaluator} \label{sec:syntax.propset.expr} - -\begin{grammar} - ::= | "+" | "-" - - ::= | "*" | "/" - - ::= | "+" | "-" - - ::= - | | | -\alt "?" -\alt "(" ")" -\end{grammar} - -The arithmetic expression syntax is simple and standard; there are currently -no bitwise, logical, or comparison operators. - -A @ expression may be a literal or an identifier. Note that -identifiers stand for themselves: they \emph{do not} denote values. For more -fancy expressions, the syntax -\begin{quote} - @"?" @ -\end{quote} -causes the @ to be evaluated using the Lisp \textsf{eval} -function. -%%% FIXME crossref to extension docs - -\subsection{C types} \label{sec:syntax.c-types} - -Sod's syntax for C types closely mirrors the standard C syntax. A C type has -two parts: a sequence of @s and a @. In -Sod, a type must contain at least one @ (i.e., -`implicit @"int"' is forbidden), and storage-class specifiers are not -recognized. - -\subsubsection{Declaration specifiers} \label{sec:syntax.c-types.declspec} - -\begin{grammar} - ::= -\alt "struct" | "union" | "enum" -\alt "void" | "char" | "int" | "float" | "double" -\alt "short" | "long" -\alt "signed" | "unsigned" -\alt - - ::= "const" | "volatile" | "restrict" - - ::= -\end{grammar} - -A @ is an identifier which has been declared as being a type name, -using the @"typename" or @"class" definitions. - -Declaration specifiers may appear in any order. However, not all -combinations are permitted. A declaration specifier must consist of zero or -more @, and one of the following, up to reordering. -\begin{itemize} -\item @ -\item @"struct" @, @"union" @, @"enum" @ -\item @"void" -\item @"char", @"unsigned char", @"signed char" -\item @"short", @"unsigned short", @"signed short" -\item @"short int", @"unsigned short int", @"signed short int" -\item @"int", @"unsigned int", @"signed int", @"unsigned", @"signed" -\item @"long", @"unsigned long", @"signed long" -\item @"long int", @"unsigned long int", @"signed long int" -\item @"long long", @"unsigned long long", @"signed long long" -\item @"long long int", @"unsigned long long int", @"signed long long int" -\item @"float", @"double", @"long double" -\end{itemize} -All of these have their usual C meanings. - -\subsubsection{Declarators} \label{sec:syntax.c-types.declarator} - -\begin{grammar} -$[k]$ ::= @^* $[k]$ - -$[k]$ ::= $k$ -\alt "(" $[k]$ ")" -\alt $[k]$ @^* - - ::= "*" @^* - - ::= "[" "]" -\alt "(" ")" - - ::= $\epsilon$ | "..." -\alt @["," "..."@] - - ::= @^+ - - ::= @[ @! $\epsilon$@] - - ::= @[@] - - ::= "." - - ::= @[@] -\end{grammar} - -The declarator syntax is taken from C, but with some differences. -\begin{itemize} -\item Array dimensions are uninterpreted @, terminated by a - closing square bracket. This allows array dimensions to contain arbitrary - constant expressions. -\item A declarator may have either a single @ at its centre or a - pair of @s separated by a @`.'; this is used to refer to - slots or messages defined in superclasses. -\end{itemize} -The remaining differences are (I hope) a matter of presentation rather than -substance. - -\subsection{Defining classes} \label{sec:syntax.class} - -\begin{grammar} - ::= -\alt -\end{grammar} - -\subsubsection{Forward declarations} \label{sec:class.class.forward} - -\begin{grammar} - ::= "class" ";" -\end{grammar} - -A @ informs Sod that an @ will be used -to name a class which is currently undefined. Forward declarations are -necessary in order to resolve certain kinds of circularity. For example, -\begin{listing} -class Sub; - -class Super : SodObject { - Sub *sub; -}; - -class Sub : Super { - /* ... */ -}; -\end{listing} - -\subsubsection{Full class definitions} \label{sec:class.class.full} - -\begin{grammar} - ::= - @[@] - "class" ":" - "{" @^* "}" - - ::= ";" -\alt -\alt -\alt ";" -\end{grammar} - -A full class definition provides a complete description of a class. - -The first @ gives the name of the class. It is an error to -give the name of an existing class (other than a forward-referenced class), -or an existing type name. It is conventional to give classes `MixedCase' -names, to distinguish them from other kinds of identifiers. - -The @ names the direct superclasses for the new class. It -is an error if any of these @s does not name a defined class. - -The @ provide additional information. The standard class -properties are as follows. -\begin{description} -\item[@"lisp_class"] The name of the Lisp class to use within the translator - to represent this class. The property value must be an identifier; the - default is @"sod_class". Extensions may define classes with additional - behaviour, and may recognize additional class properties. -\item[@"metaclass"] The name of the Sod metaclass for this class. In the - generated code, a class is itself an instance of another class -- its - \emph{metaclass}. The metaclass defines which slots the class will have, - which messages it will respond to, and what its behaviour will be when it - receives them. The property value must be an identifier naming a defined - subclass of @"SodClass". The default metaclass is @"SodClass". - %%% FIXME xref to theory -\item[@"nick"] A nickname for the class, to be used to distinguish it from - other classes in various limited contexts. The property value must be an - identifier; the default is constructed by forcing the class name to - lower-case. -\end{description} - -The class body consists of a sequence of @s enclosed in braces. -These items are discussed on the following sections. - -\subsubsection{Slot items} \label{sec:sntax.class.slot} - -\begin{grammar} - ::= - @[@] - @^+ - - ::= @["=" @] -\end{grammar} - -A @ defines one or more slots. All instances of the class and any -subclass will contain these slot, with the names and types given by the -@ and the @. Slot declarators may not -contain qualified identifiers. - -It is not possible to declare a slot with function type: such an item is -interpreted as being a @ or @. Pointers to -functions are fine. - -An @, if present, is treated as if a separate -@ containing the slot name and initializer were present. -For example, -\begin{listing} -[nick = eg] -class Example : Super { - int foo = 17; -}; -\end{listing} -means the same as -\begin{listing} -[nick = eg] -class Example : Super { - int foo; - eg.foo = 17; -}; -\end{listing} - -\subsubsection{Initializer items} \label{sec:syntax.class.init} - -\begin{grammar} - ::= @["class"@] - - ::= "=" - - :: "{" "}" | -\end{grammar} - -An @ provides an initial value for one or more slots. If -prefixed by @"class", then the initial values are for class slots (i.e., -slots of the class object itself); otherwise they are for instance slots. - -The first component of the @ must be the nickname of -one of the class's superclasses (including itself); the second must be the -name of a slot defined in that superclass. - -The initializer has one of two forms. -\begin{itemize} -\item A @ enclosed in braces denotes an aggregate initializer. - This is suitable for initializing structure, union or array slots. -\item A @ \emph{not} beginning with an open brace is a `bare' - initializer, and continues until the next @`,' or @`;' which is not within - nested brackets. Bare initializers are suitable for initializing scalar - slots, such as pointers or integers, and strings. -\end{itemize} - -\subsubsection{Message items} \label{sec:syntax.class.message} - -\begin{grammar} - ::= - @[@] - @^+ @[@] -\end{grammar} - -\subsubsection{Method items} \label{sec:syntax.class.method} - -\begin{grammar} - ::= - @[@] - @^+ - - ::= "{" "}" | "extern" ";" -\end{grammar} +\mainmatter %%%-------------------------------------------------------------------------- -\section{Class objects} - -\begin{listing} -typedef struct SodClass__ichain_obj SodClass; - -struct sod_chain { - size_t n_classes; /* Number of classes in chain */ - const SodClass *const *classes; /* Vector of classes, head first */ - size_t off_ichain; /* Offset of ichain from instance base */ - const struct sod_vtable *vt; /* Vtable pointer for chain */ - size_t ichainsz; /* Size of the ichain structure */ -}; - -struct sod_vtable { - SodClass *_class; /* Pointer to instance's class */ - size_t _base; /* Offset to instance base */ -}; - -struct SodClass__islots { - - /* Basic information */ - const char *name; /* The class's name as a string */ - const char *nick; /* The nickname as a string */ - - /* Instance allocation and initialization */ - size_t instsz; /* Instance layout size in bytes */ - void *(*imprint)(void *); /* Stamp instance with vtable ptrs */ - void *(*init)(void *); /* Initialize instance */ - - /* Superclass structure */ - size_t n_supers; /* Number of direct superclasses */ - const SodClass *const *supers; /* Vector of direct superclasses */ - size_t n_cpl; /* Length of class precedence list */ - const SodClass *const *cpl; /* Vector for class precedence list */ +\part{Tutorial} \label{p:tut} - /* Chain structure */ - const SodClass *link; /* Link to next class in chain */ - const SodClass *head; /* Pointer to head of chain */ - size_t level; /* Index of class in its chain */ - size_t n_chains; /* Number of superclass chains */ - const sod_chain *chains; /* Vector of chain structures */ - - /* Layout */ - size_t off_islots; /* Offset of islots from ichain base */ - size_t islotsz; /* Size of instance slots */ -}; - -struct SodClass__ichain_obj { - const SodClass__vt_obj *_vt; - struct SodClass__islots cls; -}; - -struct sod_instance { - struct sod_vtable *_vt; -}; -\end{listing} - -\begin{listing} -void *sod_convert(const SodClass *cls, const void *obj) -{ - const struct sod_instance *inst = obj; - const SodClass *real = inst->_vt->_cls; - const struct sod_chain *chain; - size_t i, index; - - for (i = 0; i < real->cls.n_chains; i++) { - chain = &real->cls.chains[i]; - if (chain->classes[0] == cls->cls.head) { - index = cls->cls.index; - if (index < chain->n_classes && chain->classes[index] == cls) - return ((char *)cls - inst->_vt._base + chain->off_ichain); - else - return (0); - } - } - return (0); -} -\end{listing} +\include{tutorial} %%%-------------------------------------------------------------------------- -\section{Classes} -\label{sec:class} - -\subsection{Classes and superclasses} \label{sec:class.defs} - -A @ must list one or more existing classes to be the -\emph{direct superclasses} for the new class being defined. We make the -following definitions. -\begin{itemize} -\item The \emph{superclasses} of a class consist of the class itself together - with the superclasses of its direct superclasses. -\item The \emph{proper superclasses} of a class are its superclasses other - than itself. -\item If $C$ is a (proper) superclass of $D$ then $D$ is a (\emph{proper}) - \emph{subclass} of $C$. -\end{itemize} -The predefined class @|SodObject| has no direct superclasses; it is unique in -this respect. All classes are subclasses of @|SodObject|. - -\subsection{The class precedence list} \label{sec:class.cpl} - -Let $C$ be a class. The superclasses of $C$ form a directed graph, with an -edge from each class to each of its direct superclasses. This is the -\emph{superclass graph of $C$}. - -In order to resolve inheritance of items, we define a \emph{class precedence - list} (or CPL) for each class, which imposes a total order on that class's -superclasses. The default algorithm for computing the CPL is the \emph{C3} -algorithm \cite{fixme-c3}, though extensions may implement other algorithms. - -The default algorithm works as follows. Let $C$ be the class whose CPL we -are to compute. Let $X$ and $Y$ be two of $C$'s superclasses. -\begin{itemize} -\item $C$ must appear first in the CPL. -\item If $X$ appears before $Y$ in the CPL of one of $C$'s direct - superclasses, then $X$ appears before $Y$ in the $C$'s CPL. -\item If the above rules don't suffice to order $X$ and $Y$, then whichever - of $X$ and $Y$ has a subclass which appears further left in the list of - $C$'s direct superclasses will appear earlier in the CPL. -\end{itemize} -This last rule is sufficient to disambiguate because if both $X$ and $Y$ are -superclasses of the same direct superclass of $C$ then that direct -superclass's CPL will order $X$ and $Y$. - -We say that \emph{$X$ is more specific than $Y$ as a superclass of $C$} if -$X$ is earlier than $Y$ in $C$'s class precedence list. If $C$ is clear from -context then we omit it, saying simply that $X$ is more specific than $Y$. - -\subsection{Instances and metaclasses} \label{sec:class.meta} - -A class defines the structure and behaviour of its \emph{instances}: run-time -objects created (possibly) dynamically. An instance is an instance of only -one class, though structurally it may be used in place of an instance of any -of that class's superclasses. It is possible, with care, to change the class -of an instance at run-time. +\part{Reference} \label{p:ref} -Classes are themselves represented as instances -- called \emph{class - objects} -- in the running program. Being instances, they have a class, -called the \emph{metaclass}. The metaclass defines the structure and -behaviour of the class object. - -The predefined class @|SodClass| is the default metaclass for new classes. -@|SodClass| has @|SodObject| as its only direct superclass. @|SodClass| is -its own metaclass. - -To make matters more complicated, Sod has \emph{two} distinct metalevels: as -well as the runtime metalevel, as discussed above, there's a compile-time -metalevel hosted in the Sod translator. Since Sod is written in Common Lisp, -a Sod class's compile-time metaclass is a CLOS class. The usual compile-time -metaclass is @|sod-class|. The compile-time metalevel is the subject of -\xref{ch:api}. - -\subsection{Items and inheritance} \label{sec:class.inherit} - -A class definition also declares \emph{slots}, \emph{messages}, -\emph{initializers} and \emph{methods} -- collectively referred to as -\emph{items}. In addition to the items declared in the class definition -- -the class's \emph{direct items} -- a class also \emph{inherits} items from -its superclasses. - -The precise rules for item inheritance vary according to the kinds of items -involved. - -Some object systems have a notion of `repeated inheritance': if there are -multiple paths in the superclass graph from a class to one of its -superclasses then items defined in that superclass may appear duplicated in -the subclass. Sod does not have this notion. - -\subsubsection{Slots} \label{sec:class.inherit.slots} -A \emph{slot} is a unit of state. In other object systems, slots may be -called `fields', `member variables', or `instance variables'. - -A slot has a \emph{name} and a \emph{type}. The name serves only to -distinguish the slot from other direct slots defined by the same class. A -class inherits all of its proper superclasses' slots. Slots inherited from -superclasses do not conflict with each other or with direct slots, even if -they have the same names. - -At run-time, each instance of the class holds a separate value for each slot, -whether direct or inherited. Changing the value of an instance's slot -doesn't affect other instances. - -\subsubsection{Initializers} \label{sec:class.inherit.init} -Mumble. - -\subsubsection{Messages} \label{sec:class.inherit.messages} -A \emph{message} is the stimulus for behaviour. In Sod, a class must define, -statically, the name and format of the messages it is able to receive and the -values it will return in reply. In this respect, a message is similar to -`abstract member functions' or `interface member functions' in other object -systems. - -Like slots, a message has a \emph{name} and a \emph{type}. Again, the name -serves only to distinguish the message from other direct messages defined by -the same class. Messages inherited from superclasses do not conflict with -each other or with direct messages, even if they have the same name. - -At run-time, one sends a message to an instance by invoking a function -obtained from the instance's \emph{vtable}: \xref{sec:fixme-vtable}. - -\subsubsection{Methods} \label{sec:class.inherit.methods} -A \emph{method} is a unit of behaviour. In other object systems, methods may -be called `member functions'. - -A method is associated with a message. When a message is received by an -instance, all of the methods associated with that message on the instance's -class or any of its superclasses are \emph{applicable}. The details of how -the applicable methods are invoked are described fully in -\xref{sec:fixme-method-combination}. - -\subsection{Chains and instance layout} \label{sec:class.layout} - -C is a rather low-level language, and in particular it exposes details of the -way data is laid out in memory. Since an instance of a class~$C$ should be -(at least in principle) usable anywhere an instance of some superclass $B -\succeq C$ is expected, this implies that an instance of the subclass $C$ -needs to contain within it a complete instance of each superclass $B$, laid -out according to the rules of instances of $B$, so that if we have (the -address of) an instance of $C$, we can easily construct a pointer to a thing -which looks like an instance of $B$ contained within it. - -Specifically, the information we need to retain for an instance of a -class~$C$ is: -\begin{itemize} -\item the values of each of the slots defined by $C$, including those defined - by superclasses; -\item information which will let us convert a pointer to $C$ into a pointer - to any superclass $B \succeq C$; -\item information which will let us call the appropriate effective method for - each message defined by $C$, including those defined by superclasses; and -\item some additional meta-level information, such as how to find the class - object for $C$ given (the address of) one of its instances. -\end{itemize} - -Observe that, while each distinct instance must clearly have its own storage -for slots, all instances of $C$ can share a single copy of the remaining -information. The individual instance only needs to keep a pointer to this -shared table, which, inspired by the similar structure in many \Cplusplus\ -ABIs, are called a \emph{vtable}. - -The easiest approach would be to decide that instances of $C$ are exactly -like instances of $B$, only with extra space at the end for the extra slots -which $C$ defines over and above those already existing in $B$. Conversion -is then trivial: a pointer to an instance of $C$ can be converted to a -pointer to an instance of some superclass $B$ simply by casting. Even though -the root class @|SodObject| doesn't have any slots at all, its instances will -still need a vtable so that you can find its class object: the address of the -vtable therefore needs to be at the very start of the instance structure. -Again, a vtable for a superclass would have a vtable for each of its -superclasses as a prefix, with new items added afterwards. - -This appealing approach works well for an object system which only permits -single inheritance of both state and behaviour. Alas, it breaks down when -multiple inheritance is allowed: $C$ can be a subclass of both $B$ and $B'$, -even though $B$ is not a subclass of $B'$, nor \emph{vice versa}; so, in -general, $B$'s instance structure will not be a prefix of $B'$'s, nor will -$B'$'s be a prefix of $B$'s, and therefore $C$ cannot have both $B$ and $B'$ -as a prefix. - -A (non-root) class may -- though need not -- have a distinguished \emph{link} -superclass, which need not be a direct superclass. Furthermore, each -class~$C$ must satisfy the \emph{chain condition}: for any superclass $A$ of -$C$, there can be at most one other superclass of $C$ whose link superclass -is $A$.\footnote{% - That is, it's permitted for two classes $B$ and $B'$ to have the same link - superclass $A$, but $B$ and $B'$ can't then both be superclasses of the - same class $C$.} % -Therefore, the links partition the superclasses of~$C$ into nice linear -\emph{chains}, such that each superclass is a member of exactly one chain. -If a class~$B$ has a link superclass~$A$, then $B$'s \emph{level} is one more -than that of $A$; otherwise $B$ is called a \emph{chain head} and its level -is zero. If the classes in a chain are written in a list, chain head first, -then the level of each class gives its index in the list. - -Chains therefore allow us to recover some of the linearity properties which -made layout simple in the case of single inheritance. The instance structure -for a class $C$ contains a substructure for each of $C$'s superclass chains; -a pointer to an object of class $C$ actually points to the substructure for -the chain containing $C$. The order of these substructures is unimportant -for now.\footnote{% - The chains appear in the order in which their most specific classes appear - in $C$'s class precedence list. This guarantees that the chain containing - $C$ itself appears first, so that a pointer to $C$'s instance structure is - actually a pointer to $C$'s chain substructure. Apart from that, it's a - simple, stable, but basically arbitrary choice which can't be changed - without breaking the ABI.} % -The substructure for each chain begins with a pointer to a vtable, followed -by a structure for each superclass in the chain containing the slots defined -by that superclass, with the chain head (least specific class) first. - -Suppose we have a pointer to (static) type $C$, and want to convert it into a -pointer to some superclass $B$ of $C$ -- an \emph{upcast}.\footnote{% - In the more general case, we have a pointer to static type $C$, which - actually points to an object of some subclass $D$ of $C$, and want to - convert it into a pointer to type $B$. Such a conversion is called a - \emph{downcast} if $B$ is a subclass of $C$, or a \emph{cross-cast} - otherwise. Downcasts and cross-casts require complicated run-time - checking, and can will fail unless $B$ is a superclass of $D$.} % -If $B$ is in the same chain as $C$ -- an \emph{in-chain upcast} -- then the -pointer value is already correct and it's only necessary to cast it -appropriately. Otherwise -- a \emph{cross-chain upcast} -- the pointer needs -to be adjusted to point to a different chain substructure. Since the lengths -and relative positions of the chain substructures vary between classes, the -adjustments are stored in the vtable. Cross-chain upcasts are therefore a -bit slower than in-chain upcasts. - -Each chain has its own separate vtable, because much of the metadata stored -in the vtable is specific to a particular chain. For example: -\begin{itemize} -\item offsets to other chains' substructures will vary depending on which - chain we start from; and -\item entry points to methods { +\include{concepts} +\include{cmdline} +\include{syntax} +\include{structures} +\include{runtime} %%%-------------------------------------------------------------------------- -\chapter{The Lisp programming interface} \label{ch:api} - -%% output for `h' files -%% -%% prologue -%% guard start -%% typedefs start -%% typedefs -%% typedefs end -%% includes start -%% includes -%% includes end -%% classes start -%% CLASS banner -%% CLASS islots start -%% CLASS islots slots -%% CLASS islots end -%% CLASS vtmsgs start -%% CLASS vtmsgs CLASS start -%% CLASS vtmsgs CLASS slots -%% CLASS vtmsgs CLASS end -%% CLASS vtmsgs end -%% CLASS vtables start -%% CLASS vtables CHAIN-HEAD start -%% CLASS vtables CHAIN-HEAD slots -%% CLASS vtables CHAIN-HEAD end -%% CLASS vtables end -%% CLASS vtable-externs -%% CLASS vtable-externs-after -%% CLASS methods start -%% CLASS methods -%% CLASS methods end -%% CLASS ichains start -%% CLASS ichains CHAIN-HEAD start -%% CLASS ichains CHAIN-HEAD slots -%% CLASS ichains CHAIN-HEAD end -%% CLASS ichains end -%% CLASS ilayout start -%% CLASS ilayout slots -%% CLASS ilayout end -%% CLASS conversions -%% CLASS object -%% classes end -%% guard end -%% epilogue - -%% output for `c' files -%% -%% prologue -%% includes start -%% includes -%% includes end -%% classes start -%% CLASS banner -%% CLASS direct-methods start -%% CLASS direct-methods METHOD start -%% CLASS direct-methods METHOD body -%% CLASS direct-methods METHOD end -%% CLASS direct-methods end -%% CLASS effective-methods -%% CLASS vtables start -%% CLASS vtables CHAIN-HEAD start -%% CLASS vtables CHAIN-HEAD class-pointer METACLASS -%% CLASS vtables CHAIN-HEAD base-offset -%% CLASS vtables CHAIN-HEAD chain-offset TARGET-HEAD -%% CLASS vtables CHAIN-HEAD vtmsgs CLASS start -%% CLASS vtables CHAIN-HEAD vtmsgs CLASS slots -%% CLASS vtables CHAIN-HEAD vtmsgs CLASS end -%% CLASS vtables CHAIN-HEAD end -%% CLASS vtables end -%% CLASS object prepare -%% CLASS object start -%% CLASS object CHAIN-HEAD ichain start -%% CLASS object SUPER slots start -%% CLASS object SUPER slots -%% CLASS object SUPER vtable -%% CLASS object SUPER slots end -%% CLASS object CHAIN-HEAD ichain end -%% CLASS object end -%% classes end -%% epilogue +\part{Lisp interface} \label{p:lisp} + +\include{lispintro} +%% package.lisp +%% sod.asd.in +%% sod-frontend.asd.in +%% auto.lisp.in + +\include{misc} +%% pset-impl.lisp +%% pset-parse.lisp +%% pset-proto.lisp +%% lexer-bits.lisp +%% lexer-impl.lisp +%% lexer-proto.lisp +%% utilities.lisp +%% optparse.lisp +%% frontend.lisp +%% final.lisp + +\include{parsing} +%% package.lisp +%% floc-impl.lisp +%% floc-proto.lisp +%% streams-impl.lisp +%% streams-proto.lisp +%% scanner-context-impl.lisp +%% scanner-impl.lisp +%% scanner-proto.lisp +%% scanner-token-impl.lisp +%% scanner-charbuf-impl.lisp +%% parser-impl.lisp +%% parser-proto.lisp +%% parser-expr-impl.lisp +%% parser-expr-proto.lisp + +\include{clang} +%% c-types-class-impl.lisp +%% c-types-impl.lisp +%% c-types-parse.lisp +%% c-types-proto.lisp +%% codegen-impl.lisp +%% codegen-proto.lisp +%% fragment-parse.lisp + +\include{meta} +%% classes.lisp +%% class-utilities.lisp +%% class-make-impl.lisp +%% class-make-proto.lisp +%% class-finalize-impl.lisp +%% class-finalize-proto.lisp + +\include{layout} +%% class-layout-impl.lisp +%% class-layout-proto.lisp +%% method-impl.lisp +%% method-proto.lisp +%% method-aggregate.lisp + +\include{module} +%% module-impl.lisp +%% module-parse.lisp +%% module-proto.lisp +%% builtin.lisp + +\include{output} +%% output-impl.lisp +%% output-proto.lisp +%% class-output.lisp +%% module-output.lisp %%%-------------------------------------------------------------------------- +\part{Appendices} +\appendix -\include{sod-backg} -\include{sod-protocol} +\include{cutting-room-floor} +%%%----- That's all, folks -------------------------------------------------- \end{document} - + %%% Local variables: %%% mode: LaTeX %%% TeX-PDF-mode: t diff --git a/doc/sod.toc b/doc/sod.toc new file mode 100644 index 0000000..55d77c6 --- /dev/null +++ b/doc/sod.toc @@ -0,0 +1,96 @@ +\contentsline {chapter}{Contents}{i}{chapter*.1} +\contentsline {part}{I\hspace {1em}Tutorial}{1}{part.1} +\contentsline {chapter}{\numberline {1}Tutorial}{3}{chapter.1} +\contentsline {section}{\numberline {1.1}Introduction}{3}{section.1.1} +\contentsline {subsection}{\numberline {1.1.1}Building programs with Sod}{4}{subsection.1.1.1} +\contentsline {section}{\numberline {1.2}A traditional trivial introduction}{4}{section.1.2} +\contentsline {part}{II\hspace {1em}Reference}{7}{part.2} +\contentsline {chapter}{\numberline {2}Concepts}{9}{chapter.2} +\contentsline {section}{\numberline {2.1}Classes and slots}{9}{section.2.1} +\contentsline {section}{\numberline {2.2}Messages and methods}{9}{section.2.2} +\contentsline {section}{\numberline {2.3}Metaclasses}{9}{section.2.3} +\contentsline {section}{\numberline {2.4}Modules}{9}{section.2.4} +\contentsline {chapter}{\numberline {3}Module syntax}{11}{chapter.3} +\contentsline {subsection}{\numberline {3.0.1}Lexical syntax}{11}{subsection.3.0.1} +\contentsline {subsubsection}{Identifiers}{11}{section*.2} +\contentsline {subsubsection}{String and character literals}{12}{section*.3} +\contentsline {subsubsection}{Integer literals}{12}{section*.4} +\contentsline {subsubsection}{Punctuation}{13}{section*.5} +\contentsline {subsubsection}{Comments}{13}{section*.6} +\contentsline {subsection}{\numberline {3.0.2}Special nonterminals}{13}{subsection.3.0.2} +\contentsline {subsubsection}{S-expressions}{13}{section*.7} +\contentsline {subsubsection}{C fragments}{13}{section*.8} +\contentsline {subsection}{\numberline {3.0.3}Module syntax}{14}{subsection.3.0.3} +\contentsline {subsection}{\numberline {3.0.4}Simple definitions}{14}{subsection.3.0.4} +\contentsline {subsubsection}{Importing modules}{14}{section*.9} +\contentsline {subsubsection}{Loading extensions}{14}{section*.10} +\contentsline {subsubsection}{Lisp escapes}{15}{section*.11} +\contentsline {subsubsection}{Declaring type names}{15}{section*.12} +\contentsline {subsection}{\numberline {3.0.5}Literal code}{15}{subsection.3.0.5} +\contentsline {subsection}{\numberline {3.0.6}Property sets}{16}{subsection.3.0.6} +\contentsline {subsubsection}{The expression evaluator}{16}{section*.13} +\contentsline {subsection}{\numberline {3.0.7}C types}{16}{subsection.3.0.7} +\contentsline {subsubsection}{Declaration specifiers}{17}{section*.14} +\contentsline {subsubsection}{Declarators}{17}{section*.15} +\contentsline {subsection}{\numberline {3.0.8}Defining classes}{18}{subsection.3.0.8} +\contentsline {subsubsection}{Forward declarations}{18}{section*.16} +\contentsline {subsubsection}{Full class definitions}{18}{section*.17} +\contentsline {subsubsection}{Slot items}{19}{section*.18} +\contentsline {subsubsection}{Initializer items}{20}{section*.19} +\contentsline {subsubsection}{Message items}{20}{section*.20} +\contentsline {subsubsection}{Method items}{20}{section*.21} +\contentsline {part}{III\hspace {1em}Lisp interface}{21}{part.3} +\contentsline {chapter}{\numberline {4}Protocol overview}{23}{chapter.4} +\contentsline {section}{\numberline {4.1}A tour through the translator}{23}{section.4.1} +\contentsline {section}{\numberline {4.2}Specification conventions}{23}{section.4.2} +\contentsline {subsection}{\numberline {4.2.1}Format of the entries}{24}{subsection.4.2.1} +\contentsline {chapter}{\numberline {5}Parsing}{27}{chapter.5} +\contentsline {section}{\numberline {5.1}The parser protocol}{27}{section.5.1} +\contentsline {section}{\numberline {5.2}File locations}{27}{section.5.2} +\contentsline {section}{\numberline {5.3}Scanners}{27}{section.5.3} +\contentsline {subsection}{\numberline {5.3.1}Basic scanner protocol}{27}{subsection.5.3.1} +\contentsline {subsection}{\numberline {5.3.2}Place-capture scanner protocol}{28}{subsection.5.3.2} +\contentsline {subsection}{\numberline {5.3.3}Scanner file-location protocol}{29}{subsection.5.3.3} +\contentsline {subsection}{\numberline {5.3.4}Character scanners}{29}{subsection.5.3.4} +\contentsline {subsubsection}{Stream access to character scanners}{30}{section*.22} +\contentsline {subsection}{\numberline {5.3.5}String scanners}{31}{subsection.5.3.5} +\contentsline {subsection}{\numberline {5.3.6}Character buffer scanners}{31}{subsection.5.3.6} +\contentsline {subsection}{\numberline {5.3.7}Token scanners}{32}{subsection.5.3.7} +\contentsline {subsection}{\numberline {5.3.8}List scanners}{33}{subsection.5.3.8} +\contentsline {section}{\numberline {5.4}Parsing macros}{33}{section.5.4} +\contentsline {section}{\numberline {5.5}Lexical analyser}{33}{section.5.5} +\contentsline {chapter}{\numberline {6}C language utilities}{35}{chapter.6} +\contentsline {section}{\numberline {6.1}C type representation}{35}{section.6.1} +\contentsline {subsection}{\numberline {6.1.1}Overview}{35}{subsection.6.1.1} +\contentsline {subsubsection}{Constructing C type objects}{35}{section*.23} +\contentsline {subsubsection}{Printing}{35}{section*.24} +\contentsline {subsection}{\numberline {6.1.2}The C type root class}{35}{subsection.6.1.2} +\contentsline {subsection}{\numberline {6.1.3}C type S-expression notation}{36}{subsection.6.1.3} +\contentsline {subsection}{\numberline {6.1.4}Comparing C types}{37}{subsection.6.1.4} +\contentsline {subsection}{\numberline {6.1.5}Outputting C types}{38}{subsection.6.1.5} +\contentsline {subsection}{\numberline {6.1.6}Type qualifiers and qualifiable types}{39}{subsection.6.1.6} +\contentsline {subsection}{\numberline {6.1.7}Leaf types}{40}{subsection.6.1.7} +\contentsline {subsection}{\numberline {6.1.8}Compound C types}{43}{subsection.6.1.8} +\contentsline {subsection}{\numberline {6.1.9}Pointer types}{43}{subsection.6.1.9} +\contentsline {subsection}{\numberline {6.1.10}Array types}{43}{subsection.6.1.10} +\contentsline {subsection}{\numberline {6.1.11}Function types}{44}{subsection.6.1.11} +\contentsline {subsection}{\numberline {6.1.12}Parsing C types}{45}{subsection.6.1.12} +\contentsline {section}{\numberline {6.2}Generating C code}{45}{section.6.2} +\contentsline {chapter}{\numberline {7}The output system}{47}{chapter.7} +\contentsline {part}{IV\hspace {1em}Appendices}{49}{part.4} +\contentsline {chapter}{\numberline {A}Cutting-room floor}{51}{appendix.A} +\contentsline {section}{\numberline {A.1}Generated names}{51}{section.A.1} +\contentsline {subsection}{\numberline {A.1.1}Instance layout}{51}{subsection.A.1.1} +\contentsline {section}{\numberline {A.2}Class objects}{51}{section.A.2} +\contentsline {section}{\numberline {A.3}Classes}{53}{section.A.3} +\contentsline {subsection}{\numberline {A.3.1}Classes and superclasses}{53}{subsection.A.3.1} +\contentsline {subsection}{\numberline {A.3.2}The class precedence list}{53}{subsection.A.3.2} +\contentsline {subsection}{\numberline {A.3.3}Instances and metaclasses}{53}{subsection.A.3.3} +\contentsline {subsection}{\numberline {A.3.4}Items and inheritance}{54}{subsection.A.3.4} +\contentsline {subsubsection}{Slots}{54}{section*.25} +\contentsline {subsubsection}{Initializers}{54}{section*.26} +\contentsline {subsubsection}{Messages}{54}{section*.27} +\contentsline {subsubsection}{Methods}{55}{section*.28} +\contentsline {subsection}{\numberline {A.3.5}Chains and instance layout}{55}{subsection.A.3.5} +\contentsline {section}{\numberline {A.4}Superclass linearization}{56}{section.A.4} +\contentsline {section}{\numberline {A.5}Invariance, covariance, contravariance}{57}{section.A.5} diff --git a/doc/syntax.tex b/doc/syntax.tex new file mode 100644 index 0000000..de85ce8 --- /dev/null +++ b/doc/syntax.tex @@ -0,0 +1,666 @@ +%%% -*-latex-*- +%%% +%%% Module syntax +%%% +%%% (c) 2015 Straylight/Edgeware +%%% + +%%%----- Licensing notice --------------------------------------------------- +%%% +%%% This file is part of the Sensble Object Design, an object system for C. +%%% +%%% SOD is free software; you can redistribute it and/or modify +%%% it under the terms of the GNU General Public License as published by +%%% the Free Software Foundation; either version 2 of the License, or +%%% (at your option) any later version. +%%% +%%% SOD is distributed in the hope that it will be useful, +%%% but WITHOUT ANY WARRANTY; without even the implied warranty of +%%% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +%%% GNU General Public License for more details. +%%% +%%% You should have received a copy of the GNU General Public License +%%% along with SOD; if not, write to the Free Software Foundation, +%%% Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + +\chapter{Module syntax} \label{ch:syntax} + +%%%-------------------------------------------------------------------------- + +Fortunately, Sod is syntactically quite simple. I've used a little slightly +unusual notation in order to make the presentation easier to read. For any +nonterminal $x$: +\begin{itemize} +\item $\epsilon$ denotes the empty nonterminal: + \begin{quote} + $\epsilon$ ::= + \end{quote} +\item @[$x$@] means an optional $x$: + \begin{quote} + \syntax{@[$x$@] ::= $\epsilon$ @! $x$} + \end{quote} +\item $x^*$ means a sequence of zero or more $x$s: + \begin{quote} + \syntax{$x^*$ ::= $\epsilon$ @! $x^*$ $x$} + \end{quote} +\item $x^+$ means a sequence of one or more $x$s: + \begin{quote} + \syntax{$x^+$ ::= $x$ $x^*$} + \end{quote} +\item $x$@<-list> means a sequence of one or more $x$s separated + by commas: + \begin{quote} + \syntax{$x$<-list> ::= $x$ @! $x$<-list> "," $x$} + \end{quote} +\end{itemize} + +\subsection{Lexical syntax} +\label{sec:syntax.lex} + +Whitespace and comments are discarded. The remaining characters are +collected into tokens according to the following syntax. + +\begin{grammar} + ::= +\alt +\alt +\alt +\alt +\end{grammar} + +This syntax is slightly ambiguous, and is disambiguated by the \emph{maximal +munch} rule: at each stage we take the longest sequence of characters which +could be a token. + +\subsubsection{Identifiers} \label{sec:syntax.lex.id} + +\begin{grammar} + ::= @^* + + ::= | "_" + + ::= @! + + ::= "A" | "B" | \dots\ | "Z" +\alt "a" | "b" | \dots\ | "z" +\alt + + ::= "0" | + + ::= "1" | "2" $| \cdots |$ "9" +\end{grammar} + +The precise definition of @ is left to the function +\textsf{alpha-char-p} in the hosting Lisp system. For portability, +programmers are encouraged to limit themselves to the standard ASCII letters. + +There are no reserved words at the lexical level, but the higher-level syntax +recognizes certain identifiers as \emph{keywords} in some contexts. There is +also an ambiguity (inherited from C) in the declaration syntax which is +settled by distinguishing type names from other identifiers at a lexical +level. + +\subsubsection{String and character literals} \label{sec:syntax.lex.string} + +\begin{grammar} + ::= "\"" @^* "\"" + + ::= "'" "'" + + ::= any character other than "\\" or "\"" +\alt "\\" + + ::= any character other than "\\" or "'" +\alt "\\" + + ::= any single character +\end{grammar} + +The syntax for string and character literals differs from~C. In particular, +escape sequences such as @`\textbackslash n' are not recognized. The use +of string and character literals in Sod, outside of C~fragments, is limited, +and the simple syntax seems adequate. For the sake of future compatibility, +the use of character sequences which resemble C escape sequences is +discouraged. + +\subsubsection{Integer literals} \label{sec:syntax.lex.int} + +\begin{grammar} + ::= +\alt +\alt +\alt + + ::= @^* + + ::= "0" @("b"|"B"@) @^+ + + ::= "0" | "1" + + ::= "0" @["o"|"O"@] @^+ + + ::= "0" | "1" $| \cdots |$ "7" + + ::= "0" @("x"|"X"@) @^+ + + ::= +\alt "A" | "B" | "C" | "D" | "E" | "F" +\alt "a" | "b" | "c" | "d" | "e" | "f" +\end{grammar} + +Sod understands only integers, not floating-point numbers; its integer syntax +goes slightly beyond C in allowing a @`0o' prefix for octal and @`0b' for +binary. However, length and signedness indicators are not permitted. + +\subsubsection{Punctuation} \label{sec:syntax.lex.punct} + +\begin{grammar} + ::= any nonalphanumeric character other than "_", "\"" or "'" +\end{grammar} + +\subsubsection{Comments} \label{sec:lex-comment} + +\begin{grammar} + ::= +\alt + + ::= + "/*" + @^* @(@^+ @^*@)^* + @^* + "*/" + + ::= "*" + + ::= any character other than "*" + + ::= any character other than "*" or "/" + + ::= "//" @^* + + ::= a newline character + + ::= any character other than newline +\end{grammar} + +Comments are exactly as in C99: both traditional block comments `\texttt{/*} +\dots\ \texttt{*/}' and \Cplusplus-style `\texttt{//} \dots' comments are +permitted and ignored. + +\subsection{Special nonterminals} +\label{sec:special-nonterminals} + +Aside from the lexical syntax presented above (\xref{sec:lexical-syntax}), +two special nonterminals occur in the module syntax. + +\subsubsection{S-expressions} \label{sec:syntax-sexp} + +\begin{grammar} + ::= an S-expression, as parsed by the Lisp reader +\end{grammar} + +When an S-expression is expected, the Sod parser simply calls the host Lisp +system's \textsf{read} function. Sod modules are permitted to modify the +read table to extend the S-expression syntax. + +S-expressions are self-delimiting, so no end-marker is needed. + +\subsubsection{C fragments} \label{sec:syntax.lex.cfrag} + +\begin{grammar} + ::= a sequence of C tokens, with matching brackets +\end{grammar} + +Sequences of C code are simply stored and written to the output unchanged +during translation. They are read using a simple scanner which nonetheless +understands C comments and string and character literals. + +A C fragment is terminated by one of a small number of delimiter characters +determined by the immediately surrounding context -- usually a closing brace +or bracket. The first such delimiter character which is not enclosed in +brackets, braces or parenthesis ends the fragment. + +\subsection{Module syntax} \label{sec:syntax-module} + +\begin{grammar} + ::= @^* + + ::= +\alt +\alt +\alt +\alt +\alt +\end{grammar} + +A module is the top-level syntactic item. A module consists of a sequence of +definitions. + +\subsection{Simple definitions} \label{sec:syntax.defs} + +\subsubsection{Importing modules} \label{sec:syntax.defs.import} + +\begin{grammar} + ::= "import" ";" +\end{grammar} + +The module named @ is processed and its definitions made available. + +A search is made for a module source file as follows. +\begin{itemize} +\item The module name @ is converted into a filename by appending + @`.sod', if it has no extension already.\footnote{% + Technically, what happens is \textsf{(merge-pathnames name (make-pathname + :type "SOD" :case :common))}, so exactly what this means varies + according to the host system.} % +\item The file is looked for relative to the directory containing the + importing module. +\item If that fails, then the file is looked for in each directory on the + module search path in turn. +\item If the file still isn't found, an error is reported and the import + fails. +\end{itemize} +At this point, if the file has previously been imported, nothing further +happens.\footnote{% + This check is done using \textsf{truename}, so it should see through simple + tricks like symbolic links. However, it may be confused by fancy things + like bind mounts and so on.} % + +Recursive imports, either direct or indirect, are an error. + +\subsubsection{Loading extensions} \label{sec:syntax.defs.load} + +\begin{grammar} + ::= "load" ";" +\end{grammar} + +The Lisp file named @ is loaded and evaluated. + +A search is made for a Lisp source file as follows. +\begin{itemize} +\item The name @ is converted into a filename by appending @`.lisp', + if it has no extension already.\footnote{% + Technically, what happens is \textsf{(merge-pathnames name (make-pathname + :type "LISP" :case :common))}, so exactly what this means varies + according to the host system.} % +\item A search is then made in the same manner as for module imports + (\xref{sec:syntax-module}). +\end{itemize} +If the file is found, it is loaded using the host Lisp's \textsf{load} +function. + +Note that Sod doesn't attempt to compile Lisp files, or even to look for +existing compiled files. The right way to package a substantial extension to +the Sod translator is to provide the extension as a standard ASDF system (or +similar) and leave a dropping @"foo-extension.lisp" in the module path saying +something like +\begin{quote} + \textsf{(asdf:load-system :foo-extension)} +\end{quote} +which will arrange for the extension to be compiled if necessary. + +(This approach means that the language doesn't need to depend on any +particular system definition facility. It's bad enough already that it +depends on Common Lisp.) + +\subsubsection{Lisp escapes} \label{sec:syntax.defs.lisp} + +\begin{grammar} + ::= "lisp" ";" +\end{grammar} + +The @ is evaluated immediately. It can do anything it likes. + +\textbf{Warning!} This means that hostile Sod modules are a security hazard. +Lisp code can read and write files, start other programs, and make network +connections. Don't install Sod modules from sources that you don't +trust.\footnote{% + Presumably you were going to run the corresponding code at some point, so + this isn't as unusually scary as it sounds. But please be careful.} % + +\subsubsection{Declaring type names} \label{sec:syntax.defs.typename} + +\begin{grammar} + ::= + "typename" ";" +\end{grammar} + +Each @ is declared as naming a C type. This is important because +the C type syntax -- which Sod uses -- is ambiguous, and disambiguation is +done by distinguishing type names from other identifiers. + +Don't declare class names using @"typename"; use @"class" forward +declarations instead. + +\subsection{Literal code} \label{sec:syntax-code} + +\begin{grammar} + ::= + "code" ":" @[@] + "{" "}" + + ::= "[" "]" + + ::= @^+ +\end{grammar} + +The @ will be output unchanged to one of the output files. + +The first @ is the symbolic name of an output file. Predefined +output file names are @"c" and @"h", which are the implementation code and +header file respectively; other output files can be defined by extensions. + +The second @ provides a name for the output item. Several C +fragments can have the same name: they will be concatenated together in the +order in which they were encountered. + +The @ provide a means for specifying where in the output file +the output item should appear. (Note the two kinds of square brackets shown +in the syntax: square brackets must appear around the constraints if they are +present, but that they may be omitted.) Each comma-separated @ +is a sequence of identifiers naming output items, and indicates that the +output items must appear in the order given -- though the translator is free +to insert additional items in between them. (The particular output items +needn't be defined already -- indeed, they needn't be defined ever.) + +There is a predefined output item @"includes" in both the @"c" and @"h" +output files which is a suitable place for inserting @"\#include" +preprocessor directives in order to declare types and functions for use +elsewhere in the generated output files. + +\subsection{Property sets} \label{sec:syntax.propset} + +\begin{grammar} + ::= "[" "]" + + ::= "=" +\end{grammar} + +Property sets are a means for associating miscellaneous information with +classes and related items. By using property sets, additional information +can be passed to extensions without the need to introduce idiosyncratic +syntax. + +A property has a name, given as an @, and a value computed by +evaluating an @. The value can be one of a number of types, +though the only operators currently defined act on integer values only. + +\subsubsection{The expression evaluator} \label{sec:syntax.propset.expr} + +\begin{grammar} + ::= | "+" | "-" + + ::= | "*" | "/" + + ::= | "+" | "-" + + ::= + | | | +\alt "?" +\alt "(" ")" +\end{grammar} + +The arithmetic expression syntax is simple and standard; there are currently +no bitwise, logical, or comparison operators. + +A @ expression may be a literal or an identifier. Note that +identifiers stand for themselves: they \emph{do not} denote values. For more +fancy expressions, the syntax +\begin{quote} + @"?" @ +\end{quote} +causes the @ to be evaluated using the Lisp \textsf{eval} +function. +%%% FIXME crossref to extension docs + +\subsection{C types} \label{sec:syntax.c-types} + +Sod's syntax for C types closely mirrors the standard C syntax. A C type has +two parts: a sequence of @s and a @. In +Sod, a type must contain at least one @ (i.e., +`implicit @"int"' is forbidden), and storage-class specifiers are not +recognized. + +\subsubsection{Declaration specifiers} \label{sec:syntax.c-types.declspec} + +\begin{grammar} + ::= +\alt "struct" | "union" | "enum" +\alt "void" | "char" | "int" | "float" | "double" +\alt "short" | "long" +\alt "signed" | "unsigned" +\alt + + ::= "const" | "volatile" | "restrict" + + ::= +\end{grammar} + +A @ is an identifier which has been declared as being a type name, +using the @"typename" or @"class" definitions. + +Declaration specifiers may appear in any order. However, not all +combinations are permitted. A declaration specifier must consist of zero or +more @, and one of the following, up to reordering. +\begin{itemize} +\item @ +\item @"struct" @, @"union" @, @"enum" @ +\item @"void" +\item @"char", @"unsigned char", @"signed char" +\item @"short", @"unsigned short", @"signed short" +\item @"short int", @"unsigned short int", @"signed short int" +\item @"int", @"unsigned int", @"signed int", @"unsigned", @"signed" +\item @"long", @"unsigned long", @"signed long" +\item @"long int", @"unsigned long int", @"signed long int" +\item @"long long", @"unsigned long long", @"signed long long" +\item @"long long int", @"unsigned long long int", @"signed long long int" +\item @"float", @"double", @"long double" +\end{itemize} +All of these have their usual C meanings. + +\subsubsection{Declarators} \label{sec:syntax.c-types.declarator} + +\begin{grammar} +$[k]$ ::= @^* $[k]$ + +$[k]$ ::= $k$ +\alt "(" $[k]$ ")" +\alt $[k]$ @^* + + ::= "*" @^* + + ::= "[" "]" +\alt "(" ")" + + ::= $\epsilon$ | "..." +\alt @["," "..."@] + + ::= @^+ + + ::= @[ @! $\epsilon$@] + + ::= @[@] + + ::= "." + + ::= @[@] +\end{grammar} + +The declarator syntax is taken from C, but with some differences. +\begin{itemize} +\item Array dimensions are uninterpreted @, terminated by a + closing square bracket. This allows array dimensions to contain arbitrary + constant expressions. +\item A declarator may have either a single @ at its centre or a + pair of @s separated by a @`.'; this is used to refer to + slots or messages defined in superclasses. +\end{itemize} +The remaining differences are (I hope) a matter of presentation rather than +substance. + +\subsection{Defining classes} \label{sec:syntax.class} + +\begin{grammar} + ::= +\alt +\end{grammar} + +\subsubsection{Forward declarations} \label{sec:class.class.forward} + +\begin{grammar} + ::= "class" ";" +\end{grammar} + +A @ informs Sod that an @ will be used +to name a class which is currently undefined. Forward declarations are +necessary in order to resolve certain kinds of circularity. For example, +\begin{listing} +class Sub; + +class Super : SodObject { + Sub *sub; +}; + +class Sub : Super { + /* ... */ +}; +\end{listing} + +\subsubsection{Full class definitions} \label{sec:class.class.full} + +\begin{grammar} + ::= + @[@] + "class" ":" + "{" @^* "}" + + ::= ";" +\alt +\alt +\alt ";" +\end{grammar} + +A full class definition provides a complete description of a class. + +The first @ gives the name of the class. It is an error to +give the name of an existing class (other than a forward-referenced class), +or an existing type name. It is conventional to give classes `MixedCase' +names, to distinguish them from other kinds of identifiers. + +The @ names the direct superclasses for the new class. It +is an error if any of these @s does not name a defined class. + +The @ provide additional information. The standard class +properties are as follows. +\begin{description} +\item[@"lisp_class"] The name of the Lisp class to use within the translator + to represent this class. The property value must be an identifier; the + default is @"sod_class". Extensions may define classes with additional + behaviour, and may recognize additional class properties. +\item[@"metaclass"] The name of the Sod metaclass for this class. In the + generated code, a class is itself an instance of another class -- its + \emph{metaclass}. The metaclass defines which slots the class will have, + which messages it will respond to, and what its behaviour will be when it + receives them. The property value must be an identifier naming a defined + subclass of @"SodClass". The default metaclass is @"SodClass". + %%% FIXME xref to theory +\item[@"nick"] A nickname for the class, to be used to distinguish it from + other classes in various limited contexts. The property value must be an + identifier; the default is constructed by forcing the class name to + lower-case. +\end{description} + +The class body consists of a sequence of @s enclosed in braces. +These items are discussed on the following sections. + +\subsubsection{Slot items} \label{sec:sntax.class.slot} + +\begin{grammar} + ::= + @[@] + @^+ + + ::= @["=" @] +\end{grammar} + +A @ defines one or more slots. All instances of the class and any +subclass will contain these slot, with the names and types given by the +@ and the @. Slot declarators may not +contain qualified identifiers. + +It is not possible to declare a slot with function type: such an item is +interpreted as being a @ or @. Pointers to +functions are fine. + +An @, if present, is treated as if a separate +@ containing the slot name and initializer were present. +For example, +\begin{listing} +[nick = eg] +class Example : Super { + int foo = 17; +}; +\end{listing} +means the same as +\begin{listing} +[nick = eg] +class Example : Super { + int foo; + eg.foo = 17; +}; +\end{listing} + +\subsubsection{Initializer items} \label{sec:syntax.class.init} + +\begin{grammar} + ::= @["class"@] + + ::= "=" + + :: "{" "}" | +\end{grammar} + +An @ provides an initial value for one or more slots. If +prefixed by @"class", then the initial values are for class slots (i.e., +slots of the class object itself); otherwise they are for instance slots. + +The first component of the @ must be the nickname of +one of the class's superclasses (including itself); the second must be the +name of a slot defined in that superclass. + +The initializer has one of two forms. +\begin{itemize} +\item A @ enclosed in braces denotes an aggregate initializer. + This is suitable for initializing structure, union or array slots. +\item A @ \emph{not} beginning with an open brace is a `bare' + initializer, and continues until the next @`,' or @`;' which is not within + nested brackets. Bare initializers are suitable for initializing scalar + slots, such as pointers or integers, and strings. +\end{itemize} + +\subsubsection{Message items} \label{sec:syntax.class.message} + +\begin{grammar} + ::= + @[@] + @^+ @[@] +\end{grammar} + +\subsubsection{Method items} \label{sec:syntax.class.method} + +\begin{grammar} + ::= + @[@] + @^+ + + ::= "{" "}" | "extern" ";" +\end{grammar} + + +%%%----- That's all, folks -------------------------------------------------- + +%%% Local variables: +%%% mode: LaTeX +%%% TeX-master: "sod.tex" +%%% TeX-PDF-mode: t +%%% End: diff --git a/doc/sod-tut.tex b/doc/tutorial.tex similarity index 86% rename from doc/sod-tut.tex rename to doc/tutorial.tex index ca686aa..afc6109 100644 --- a/doc/sod-tut.tex +++ b/doc/tutorial.tex @@ -23,8 +23,7 @@ %%% along with SOD; if not, write to the Free Software Foundation, %%% Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. -\chapter{Tutorial} -\label{ch:tut} +\chapter{Tutorial} \label{ch:tutorial} This chapter provides a tutorial introduction to the Sod object system. It intentionally misses out nitty-gritty details. If you want those, the @@ -35,7 +34,7 @@ You'll have to bear with him. If you think you can do a better job, I'm sure that he'll be grateful for your contribution. %%%-------------------------------------------------------------------------- -\section{Introduction} \label{sec:tut.intro} +\section{Introduction} \label{sec:tutorial.intro} Sod is an object system for the C~programming language. Because it doesn't have enough already. Actually, that's not right: it's got plenty already. @@ -68,7 +67,7 @@ means that is has the following features. There's a good chance that half of that didn't mean anything to you. Bear with me, though, because we'll explain it all eventually. -\subsection{Building programs with Sod} \label{sec:tut.intro.build} +\subsection{Building programs with Sod} \label{sec:tutorial.intro.build} Sod is basically a fancy preprocessor, in the same vein as Lex and Yacc. It reads source files written in a vaguely C-like language. It produces output @@ -84,8 +83,8 @@ The main consequences of this are as follows. \item Sod hasn't made any attempt to improve C's syntax. It's just as hostile to object-oriented programming as it ever was. This means that you'll end up writing ugly things like - \begin{prog}% - thing->_vt->foo.frob(thing, mumble);% + \begin{prog} + thing->_vt->foo.frob(thing, mumble); \end{prog} fairly frequently. This can be made somewhat less painful using macros, but we're basically stuck with C. The upside is that you know exactly what @@ -96,11 +95,11 @@ The main consequences of this are as follows. \end{itemize} Of course, this means that your build system needs to become more complicated. If you use \man{make}{1}, then something like -\begin{prog}% - SOD = sod - - .SUFFIXES: .sod .c .h - .sod.c:; \$(SOD) -tc \$< +\begin{prog} + SOD = sod \\ + \\ + .SUFFIXES: .sod .c .h \\ + .sod.c:; \$(SOD) -tc \$< \\ .sod.h:; \$(SOD) -th \$< \end{prog} ought to do the job. @@ -109,48 +108,47 @@ ought to do the job. \section{A traditional trivial introduction} The following is a simple Sod input file. -\begin{prog}\quad\=\quad\=\kill% -/* -*-sod-*- */ - -code c : includes \{ -\#include "greeter.h" -\} - -code h : includes \{ -\#include -\#include +\begin{prog} +/* -*-sod-*- */ \\ +\\ +code c : includes \{ \\ +\#include "greeter.h" \\ +\} \\ +\\ +code h : includes \{ \\ +\#include \\ +\#include \\ +\} \\ +\\ +class Greeter : SodObject \{ \\ \ind + void greet(FILE *fp) \{ \\ \ind + fputs("Hello, world!\textbackslash n", fp); \- \\ + \} \- \\ \} - -class Greeter : SodObject \{ \+ - void greet(FILE *fp) \{ \+ - fputs("Hello, world!\textbackslash n", fp); \- - \} \- -\} % \end{prog} Save it as @"greeter.sod", and run -\begin{prog}% -sod --gc --gh greeter % +\begin{prog} +sod --gc --gh greeter \end{prog} This will create files @"greeter.c" and @"greeter.h" in the current directory. Here's how we might use such a simple thing. -\begin{prog}\quad\=\kill% -\#include "greeter.h" - -int main(void) -\{ \+ - struct Greeter__ilayout g_obj; - Greeter *g = Greeter__class->cls.init(\&g_obj); - - g->_vt.greeter.greet(g, stdout); - return (0); \- -\} % +\begin{prog} +\#include "greeter.h" \\ +\\ +int main(void) \\ +\{ \\ \ind + SOD_DECL(Greeter, g); \\ + \\ + Greeter_greet(g, stdout); \\ + return (0); \- \\ +\} \end{prog} Compare this to the traditional -\begin{prog}\quad\=\kill% -\#include - -int main(void) \+ - \{ fputs("Hello, world\\n", stdout); return (0); \} % +\begin{prog} +\#include \\ +\\ +int main(void) \\ \ind + \{ fputs("Hello, world@\\n", stdout); return (0); \} \end{prog} and I'm sure you'll appreciate the benefits of using Sod already -- mostly to do with finger exercise. Trust me, it gets more useful. @@ -161,13 +159,13 @@ it (after the comment which tells Emacs how to cope with it). The first part consists of the two @"code" stanzas. Both of them define gobbets of raw C code to copy into output files. The first one, @"code~: c"~\ldots, says that -\begin{prog}% - \#include "greeter.h" % +\begin{prog} + \#include "greeter.h" \end{prog} needs to appear in the generated @|greeter.c| file; the second says that -\begin{prog}% - \#include - \#include % +\begin{prog} + \#include \\ + \#include \end{prog} needs to appear in the header file @|greeter.h|. The generated C files need to get declarations for external types and functions (e.g., @"FILE" and @@ -176,10 +174,10 @@ declarations from the corresponding @".h" file. Sod takes a very simple approach to all of this: it expects you, the programmer, to deal with it. The basic syntax for @"code" stanzas is -\begin{prog}\quad\=\kill% - code @ : @
\{ - \> @ - \} % +\begin{prog} + code @ : @
\{ \\ \ind + @ \- \\ + \} \end{prog} The @ is either @"c" or @"h", and says which output file the code wants to be written to. The @
is a name which explains where in the @@ -193,13 +191,13 @@ message. So far, so good. The C code, which we thought we understood, contains some bizarre looking runes. Let's take it one step at a time. -\begin{prog}% - struct Greeter__ilayout g_obj; % +\begin{prog} + struct Greeter__ilayout g_obj; \end{prog} allocates space for an instance of class @"Greeter". We're not going to use this space directly. Instead, we do this frightening looking thing. -\begin{prog}% - Greeter *g = Greeter__class->cls.init(\&g_obj); % +\begin{prog} + Greeter *g = Greeter__class->cls.init(\&g_obj); \end{prog} Taking it slowly: @"Greeter__class" is a pointer to the object that represents our class @"Greeter". This object contains a member, named @@ -209,8 +207,8 @@ the instance, which we use in preference to grovelling about in the @"ilayout" structure. Having done this, we `send the instance a message': -\begin{prog}% - g->_vt->greeter.greet(g, stdout); % +\begin{prog} + g->_vt->greeter.greet(g, stdout); \end{prog} This looks horrific, and seems to repeat itself quite unnecessarily. The first @"g" is the recipient of our `message'. The second is indeed a copy of