[sod] / doc / syntax.tex

%%% -*-latex-*-
%%%
%%% Module syntax
%%%
%%% (c) 2015 Straylight/Edgeware
%%%

%%%----- Licensing notice ---------------------------------------------------
%%%
%%% This file is part of the Sensible Object Design, an object system for C.
%%%
%%% SOD is free software; you can redistribute it and/or modify
%%% it under the terms of the GNU General Public License as published by
%%% the Free Software Foundation; either version 2 of the License, or
%%% (at your option) any later version.
%%%
%%% SOD is distributed in the hope that it will be useful,
%%% but WITHOUT ANY WARRANTY; without even the implied warranty of
%%% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
%%% GNU General Public License for more details.
%%%
%%% You should have received a copy of the GNU General Public License
%%% along with SOD; if not, write to the Free Software Foundation,
%%% Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

\chapter{Module syntax} \label{ch:syntax}

%%%--------------------------------------------------------------------------
\section{Lexical syntax} \label{sec:syntax.lex}

Whitespace and comments are discarded.  The remaining characters are
collected into tokens according to the following syntax.

\begin{grammar}
<token> ::= <identifier>
\alt <string-literal>
\alt <char-literal>
\alt <integer-literal>
\alt <punctuation>
\end{grammar}

This syntax is slightly ambiguous, and is disambiguated by the \emph{maximal
munch} rule: at each stage we take the longest sequence of characters which
could be a token.


\subsection{Identifiers} \label{sec:syntax.lex.id}

\begin{grammar}
<identifier> ::= <id-start-char> @<id-body-char>^*

<id-start-char> ::= <alpha-char> | "_"

<id-body-char> ::= <id-start-char> @! <digit-char>

<alpha-char> ::= "A" | "B" | \dots\ | "Z"
\alt "a" | "b" | \dots\ | "z"
\alt <extended-alpha-char>

<digit-char> ::= "0" | <nonzero-digit-char>

<nonzero-digit-char> ::= "1" | "2" $| \ldots |$ "9"
\end{grammar}

The precise definition of @<alpha-char> is left to the function
@|alpha-char-p| in the hosting Lisp system.  For portability, programmers are
encouraged to limit themselves to the standard ASCII letters.

There are no reserved words at the lexical level, but the higher-level syntax
recognizes certain identifiers as \emph{keywords} in some contexts.  There is
also an ambiguity (inherited from C) in the declaration syntax which is
settled by distinguishing type names from other identifiers at a lexical
level.


\subsection{String and character literals} \label{sec:syntax.lex.string}

\begin{grammar}
<string-literal> ::= "\"" @<string-literal-char>^* "\""

<char-literal> ::= "'" <char-literal-char> "'"

<string-literal-char> ::= any character other than "\\" or "\""
\alt "\\" <char>

<char-literal-char> ::= any character other than "\\" or "'"
\alt "\\" <char>

<char> ::= any single character
\end{grammar}

The syntax for string and character literals differs from~C.  In particular,
escape sequences such as @`\textbackslash n' are not recognized.  The use
of string and character literals in Sod, outside of C~fragments, is limited,
and the simple syntax seems adequate.  For the sake of future compatibility,
the use of character sequences which resemble C escape sequences is
discouraged.


\subsection{Integer literals} \label{sec:syntax.lex.int}

\begin{grammar}
<integer-literal> ::= <decimal-integer>
\alt <binary-integer>
\alt <octal-integer>
\alt <hex-integer>

<decimal-integer> ::= "0" | <nonzero-digit-char> @<digit-char>^*

<binary-integer> ::= "0" @("b"|"B"@) @<binary-digit-char>^+

<binary-digit-char> ::= "0" | "1"

<octal-integer> ::= "0" @["o"|"O"@] @<octal-digit-char>^+

<octal-digit-char> ::= "0" | "1" $| \ldots |$ "7"

<hex-integer> ::= "0" @("x"|"X"@) @<hex-digit-char>^+

<hex-digit-char> ::= <digit-char>
\alt "A" | "B" | "C" | "D" | "E" | "F"
\alt "a" | "b" | "c" | "d" | "e" | "f"
\end{grammar}

Sod understands only integers, not floating-point numbers; its integer syntax
goes slightly beyond C in allowing a @`0o' prefix for octal and @`0b' for
binary.  However, length and signedness indicators are not permitted.


\subsection{Punctuation} \label{sec:syntax.lex.punct}

\begin{grammar}
<punctuation> ::= any nonalphanumeric character other than "_", "\"" or "'"
\end{grammar}


\subsection{Comments} \label{sec:syntax.lex.comment}

\begin{grammar}
<comment> ::= <block-comment>
\alt <line-comment>

<block-comment> ::=
  "/*"
  @<not-star>^* @(@<star>^+ <not-star-or-slash> @<not-star>^*@)^*
  @<star>^*
  "*/"

<star> ::= "*"

<not-star> ::= any character other than "*"

<not-star-or-slash> ::= any character other than "*" or  "/"

<line-comment> ::= "/\,/" @<not-newline>^* <newline>

<newline> ::= a newline character

<not-newline> ::= any character other than newline
\end{grammar}

Comments are exactly as in C99: both traditional block comments `@|/*| \dots\
@|*/|' and \Cplusplus-style `@|/\,/| \dots' comments are permitted and
ignored.


\subsection{Special nonterminals} \label{sec:syntax.lex.special}

Aside from the lexical syntax presented above (\xref{sec:lexical-syntax}),
two special nonterminals occur in the module syntax.

\subsubsection{S-expressions}
\begin{grammar}
<s-expression> ::= an S-expression, as parsed by the Lisp reader
\end{grammar}

When an S-expression is expected, the Sod parser simply calls the host Lisp
system's @|read| function.  Sod modules are permitted to modify the read
table to extend the S-expression syntax.

S-expressions are self-delimiting, so no end-marker is needed.

\subsubsection{C fragments}
\begin{grammar}
<c-fragment> ::= a sequence of C tokens, with matching brackets
\end{grammar}

Sequences of C code are simply stored and written to the output unchanged
during translation.  They are read using a simple scanner which nonetheless
understands C comments and string and character literals.

A C fragment is terminated by one of a small number of delimiter characters
determined by the immediately surrounding context -- usually some kind of
bracket.  The first such delimiter character which is not enclosed in
brackets, braces or parentheses ends the fragment.

%%%--------------------------------------------------------------------------
\section{C types} \label{sec:syntax.type}

Sod's syntax for C types closely mirrors the standard C syntax.  A C type has
two parts: a sequence of @<declaration-specifier>s and a @<declarator>.  In
Sod, a type must contain at least one @<declaration-specifier> (i.e.,
`implicit @|int|' is forbidden), and storage-class specifiers are not
recognized.


\subsection{Declaration specifiers} \label{sec:syntax.type.declspec}

\begin{grammar}
<declaration-specifier> ::= <type-name>
\alt "struct" <identifier> | "union" <identifier> | "enum" <identifier>
\alt "void" | "char" | "int" | "float" | "double"
\alt "short" | "long"
\alt "signed" | "unsigned"
\alt "bool" | "_Bool"
\alt "imaginary" | "_Imaginary" | "complex" | "_Complex"
\alt <qualifier>
\alt <storage-specifier>
\alt <atomic-type>
\alt <other-declspec>

<qualifier> ::= <atomic> | "const" | "volatile" | "restrict"

<plain-type> ::= @<declaration-specifier>^+ <abstract-declarator>

<atomic-type> ::= <atomic> "(" <plain-type> ")"

<atomic> ::= "atomic" | "_Atomic"

<storage-specifier> ::= <alignas> "(" <c-fragment> ")"

<alignas> ::= "alignas" "_Alignas"

<type-name> ::= <identifier>
\end{grammar}

Declaration specifiers may appear in any order.  However, not all
combinations are permitted.  A declaration specifier must consist of zero or
more @<qualifier>s, zero or more @<storage-specifier>s, and one of the
following, up to reordering:
\begin{itemize}
\item @<type-name>;
\item @<atomic-type>;
\item @"struct" @<identifier>; @"union" @<identifier>; @"enum" @<identifier>;
\item @"void";
\item @"_Bool", @"bool";
\item @"char"; @"unsigned char"; @"signed char";
\item @"short", @"signed short", @"short int", @"signed short int";
  @"unsigned short", @"unsigned short int";
\item @"int", @"signed", @"signed int"; @"unsigned", @"unsigned int";
\item @"long", @"signed long", @"long int", @"signed long int"; @"unsigned
  long", @"unsigned long int";
\item @"long long", @"signed long long", @"long long int", @"signed long long
  int"; @"unsigned long long", @"unsigned long long int";
\item @"float"; @"double"; @"long double";
\item @"float _Imaginary", @"float imaginary"; @"double _Imaginary", @"double
  imaginary"; @"long double _Imaginary", @"long double imaginary";
\item @"float _Complex", @"float complex"; @"double _Complex", @"double
  complex"; @"long double _Complex", @"long double complex".
\end{itemize}
All of these have their usual C meanings.  Groups separated by commas mean
the same thing, and Sod will not preserve the distinction.

Almost all of these mean the same as they do in C.  There are some minor
differences:
\begin{itemize}
\item In C, the `tag' namespace is shared between @|struct|, @|union|, and
  @|enum|; Sod has three distinct namespaces for tags.  This may be fixed in
  the future.
\item The @<other-declspec> production is a syntactic extension point, where
  extensions can introduce their own additions to the type system.
\end{itemize}

C standards from C99 onwards have tended to introduce new keywords beginning
with an underscore followed by an uppercase letter, so as to avoid conflicts
with existing code.  More conventional spellings are then provided by macros
in new header files.  For example, C99 introduced @"_Bool", and a header file
@|<stdbool.h>| which defines the macro @|bool|.  Sod recognizes both the ugly
underscore names and the more conventional macro names on input, but always
emits the ugly names.  This doesn't cause a compatibility problem in Sod,
because Sod's parser recognizes keywords only in the appropriate context.
For example, the (ill-advised) slot declaration
\begin{prog}
  bool bool;
\end{prog}
is completely acceptable, and will cause the C structure member
\begin{prog}
  \_Bool bool;
\end{prog}
to be emitted on output, which will be acceptable to C as long as
@|<stdbool.h>| is not included.

A @<type-name> is an identifier which has been declared as being a type name,
using the @"typename" or @"class" definitions.  The following type names are
defined in the built-in module.
\begin{itemize}
\item @|va_list|
\item @|size_t|
\item @|ptrdiff_t|
\item @|wchar_t|
\end{itemize}


\subsection{Declarators} \label{sec:syntax.type.declarator}

\begin{grammar}
<declarator>$[k, a]$ ::= @<pointer>^* <primary-declarator>$[k, a]$

<primary-declarator>$[k, a]$ ::= $k$
\alt "(" <primary-declarator>$[k, a]$ ")"
\alt <primary-declarator>$[k, a]$ @<declarator-suffix>$[a]$

<pointer> ::= "*" @<qualifier>^*

<declarator-suffix>$[a]$ ::= "[" <c-fragment> "]"
\alt "(" $a$ ")"

<argument-list> ::= $\epsilon$ | "\dots"
\alt <list>$[\mbox{@<argument>}]$ @["," "\dots"@]

<argument> ::= @<declaration-specifier>^+ <argument-declarator>

<abstract-declarator> ::= <declarator>$[\epsilon, \mbox{@<argument-list>}]$

<argument-declarator> ::=
  <declarator>$[\mbox{@<identifier> @! $\epsilon$}, \mbox{@<argument-list>}]$

<simple-declarator> ::=
  <declarator>$[\mbox{@<identifier>}, \mbox{@<argument-list>}]$
\end{grammar}

The declarator syntax is taken from C, but with some differences.
\begin{itemize}
\item Array dimensions are uninterpreted @<c-fragments>, terminated by a
  closing square bracket.  This allows array dimensions to contain arbitrary
  constant expressions.
\item A declarator may have either a single @<identifier> at its centre or a
  pair of @<identifier>s separated by a @`.'; this is used to refer to
  slots or messages defined in superclasses.
\end{itemize}
The remaining differences are (I hope) a matter of presentation rather than
substance.

There is additional syntax to support messages and methods which accept
keyword arguments.

\begin{grammar}
<keyword-argument> ::= <argument> @["=" <c-fragment>@]

<keyword-argument-list> ::=
  @[<list>$[\mbox{@<argument>}]$@]
  "?" @[<list>$[\mbox{@<keyword-argument>}]$@]

<method-argument-list> ::= <argument-list> @! <keyword-argument-list>

<dotted-name> ::= <identifier> "." <identifier>

<keyword-declarator>$[k]$ ::=
  <declarator>$[k, \mbox{@<method-argument-list>}]$
\end{grammar}

%%%--------------------------------------------------------------------------
\section{Properties} \label{sec:syntax.prop}

\begin{grammar}
<properties> ::= "[" <list>$[\mbox{@<property>}]$ "]"

<property> ::= <identifier> "=" <expression>

<expression> ::= <term> | <expression> "+" <term> | <expression> "--" <term>

<term> ::= <factor> | <term> "*" <factor> | <term> "/" <factor>

<factor> ::= <primary> | "+" <factor> | "--" <factor>

<primary> ::=
     <integer-literal> | <string-literal> | <char-literal> | <identifier>
\alt "<" <plain-type> ">"
\alt "{" <c-fragment> "}"
\alt "?" <s-expression>
\alt "(" <expression> ")"
\end{grammar}

\emph{Property sets} are a means for associating miscellaneous information
with compile-time metaobjects such as modules, classes, messages, methods,
slots, and initializers.  By using property sets, additional information can
be passed to extensions without the need to introduce idiosyncratic syntax.
(That said, extensions can add additional first-class syntax, if necessary.)

An error is reported if an unrecognized property is associated with an
object.


\subsection{Property values} \label{sec:syntax.prop.value}

A property has a name, given as an @<identifier>, and a value computed by
evaluating an @<expression>.  The value can be one of a number of types.

\begin{itemize}

\item An @<integer-literal> denotes a value of type @|int|.

\item Similarly @<string-literal> and @<char-literal> denote @|string| and
  @|char| values respectively.  Note that, as properties, characters are
  quite distinct from integers, whereas in C, a character literal denotes a
  value of type @|int|.

\item There are no variables in the property-value syntax.  Rather, an
  @<identifier> denotes that identifier, as a value of type @|id|.

\item A C type (a @<plain-type>, as described in \xref{sec:syntax.type})
  between angle brackets, e.g., @|<int>|, or @|<char *>|, or @|<void (*(int,
  void (*)(int)))(int)>|, denotes that C type, as a value of type @|type|.

\item A @<c-fragment> within braces denotes the tokens between (and not
  including) the braces, as a value of type @|c-fragment|.

\end{itemize}

As shown in the grammar, there are four binary operators, @"+" (addition),
@"--" (subtraction), @"*" (multiplication), and @"/" (division);
multiplication and division have higher precedence than addition and
subtraction, and operators of the same precedence associate left-to-right.
There are also unary @"+" (no effect) and @"--" (negation) operators, with
higher precedence.  All of the above operators act only on integer operands
and yield integer results.  (Although the unary @"+" operator yields its
operand unchanged, an error is still reported if it is applied to a
non-integer value.)  There are currently no bitwise, logical, or comparison
operators.

Finally, an S-expression preceded by @|?| causes the expression to be read in
the current package (which is always @|sod-user| at the start of a module)
and immediately evaluated (using @|eval|); the resulting value is converted
into a property value using the \descref{gf}{decode-property}[generic
function].


\subsection{Property output types and coercions}
\label{sec:syntax.prop.coerce}

When a property value is inspected by the Sod translator, or an extension, it
is \emph{coerced} so as to conform to a requested output type.  This coercion
process is performed by the \descref{gf}{coerce-property-value}[generic
function], and additional output types and coercions can be defined by
extensions.  The built-in output types coercions, from the value types listed
above, are as follows.

\begin{itemize}

\item The output types @|int|, @|string|, @|char|, @|id|, and @|c-fragment|
  correspond to the like-named value types described above.  No coercions to
  these output types are defined for the described value types.\footnote{%
    There is a coercion to @|id| from the value type @|symbol|, but it is
    only possible to generate a property value of type @|symbol| using Lisp.}

\item The output type @|type| denotes a C type, as does the value type
  @|type|.  In addition, a value of type @|id| can be coerced to a C type if
  it is the name of a class, a type name explicitly declared by @|typename|,
  or it is one of: @|bool|, @|_Bool|, @|void|, @|char|, @|short|, @|int|,
  @|signed|, @|unsigned|, @|long|, @|size_t|, @|ptrdiff_t|, @|wchar_t|,
  or @|va_list|.

\item The @|boolean| output type denotes a boolean value, which may be either
  true or false.  A value of type @|id| is considered true if it is @|true|,
  @|t|, @|yes|, @|on|, or @|verily|; or false if it is @|false|, @|nil|,
  @|no|, @|off|, or @|nowise|; it is erroneous to provide any other
  identifier where a boolean value is wanted.  A value of type @|int| is
  considered true if it is nonzero, or false if it is zero.

\item The @|symbol| output type denotes a Lisp symbol.

  A value of type @|id| is coerced to a symbol as follows.  First, the
  identifier name is subjected to \emph{case inversion}: if all of the
  letters in the name have the same case, either upper or lower, then they
  are replaced with the corresponding letters in the opposite case, lower or
  upper; if the name contains letters of both cases, then it is not changed.
  For example, @|foo45| becomes @|FOO45|, or \emph{vice-versa}; but @|Splat|
  remains as it is.  Second, the name is subjected to \emph{separator
  switching}: all underscores in the name are replaced with hyphens (and
  \emph{vice-versa}, though hyphens aren't permitted in identifiers in the
  first place).  Finally, the resulting name is interned in the current
  package, which will usually be @|sod-user| unless changed explicitly by the
  module.

  A value of type @|string| is coerced to a symbol as follows.  If the string
  contains no colons, then it is case-inverted (but not separator-switched)
  and interned in the current package.  Otherwise, the string either has the
  form $p @|:| q$, where $q$ does not begin with a colon (the
  \emph{single-colon} case) or $p @|::| q$ (the \emph{double-colon} case);
  where $p$ does not contain a colon.  Both $p$ and $q$ are case-inverted
  (but not separator-switched).  If $p$ does not name a package, then an
  error is reported; as a special case, if $p$ is empty, then it is
  considered to name the @|keyword| package.  Otherwise, $q$ is looked up as
  a symbol name in package~$p$; in the single-colon case, if the symbol is
  not an exported symbol in package~$p$, then an error is reported; in the
  double-colon case, $q$ is interned in package~$p$ (and so there needn't be
  an exported symbol -- or, indeed, and symbol at all -- named $q$
  beforehand).

\item The @|keyword| output type denotes symbols within the @|keyword|
  package.  Value of type @|id| or @|string| can be coerced to a @|keyword|
  in the same way as to a @|symbol|, as described above, only the converted
  name is looked up in the @|keyword| package rather than the current
  package.  (A @|string| can override this by specifying an explicit package
  name, but this is unlikely to be very helpful.)

\end{itemize}

%%%--------------------------------------------------------------------------
\section{Module syntax} \label{sec:syntax.module}

\begin{grammar}
<module> ::= @<definition>^*

<definition> ::= <property-definition> \fixme{undefined}
\alt <import-definition>
\alt <load-definition>
\alt <lisp-definition>
\alt <code-definition>
\alt <typename-definition>
\alt <class-definition>
\alt <other-definition> \fixme{undefined}
\end{grammar}

A @<module> is the top-level syntactic item: a source file presented to Sod
is expected to conform with the @<module> syntax.

A module consists of a sequence of definitions.

\fixme{describe syntax; expand}
Properties:
\begin{description}
\item[@|module_class|] A symbol naming the Lisp class to use to
  represent the module.
\item[@|guard|] An identifier to use as the guard symbol used to prevent
  multiple inclusion in the header file.
\end{description}


\subsection{Simple definitions} \label{sec:syntax.module.simple}

\subsubsection{Importing modules}
\begin{grammar}
<import-definition> ::= "import" <string> ";"
\end{grammar}

The module named @<string> is processed and its definitions made available.

A search is made for a module source file as follows.
\begin{itemize}
\item The module name @<string> is converted into a filename by appending
  @`.sod', if it has no extension already.\footnote{%
    Technically, what happens is @|(merge-pathnames name (make-pathname :type
    "SOD" :case :common))|, so exactly what this means varies according to
    the host system.} %
\item The file is looked for relative to the directory containing the
  importing module.
\item If that fails, then the file is looked for in each directory on the
  module search path in turn.
\item If the file still isn't found, an error is reported and the import
  fails.
\end{itemize}
At this point, if the file has previously been imported, nothing further
happens.\footnote{%
  This check is done using @|truename|, so it should see through simple
  tricks like symbolic links.  However, it may be confused by fancy things
  like bind mounts and so on.} %

Recursive imports, either direct or indirect, are an error.

\subsubsection{Loading extensions}
\begin{grammar}
<load-definition> ::= "load" <string> ";"
\end{grammar}

The Lisp file named @<string> is loaded and evaluated.

A search is made for a Lisp source file as follows.
\begin{itemize}
\item The name @<string> is converted into a filename by appending @`.lisp',
  if it has no extension already.\footnote{%
    Technically, what happens is @|(merge-pathnames name (make-pathname :type
    "LISP" :case :common))|, so exactly what this means varies according to
    the host system.} %
\item A search is then made in the same manner as for module imports
  (\xref{sec:syntax-module}).
\end{itemize}
If the file is found, it is loaded using the host Lisp's @|load| function.

Note that Sod doesn't attempt to compile Lisp files, or even to look for
existing compiled files.  The right way to package a substantial extension to
the Sod translator is to provide the extension as a standard ASDF system (or
similar) and leave a dropping @|foo-extension.lisp| in the module path saying
something like
\begin{prog}
  (asdf:load-system :foo-extension)
\end{prog}
which will arrange for the extension to be compiled if necessary.

(This approach means that the language doesn't need to depend on any
particular system definition facility.  It's bad enough already that it
depends on Common Lisp.)

\subsubsection{Lisp escapes}
\begin{grammar}
<lisp-definition> ::= "lisp" <s-expression> ";"
\end{grammar}

The @<s-expression> is evaluated immediately.  It can do anything it likes.

\begin{boxy}[Warning!]
  This means that hostile Sod modules are a security hazard.  Lisp code can
  read and write files, start other programs, and make network connections.
  Don't install Sod modules from sources that you don't trust.\footnote{%
    Presumably you were going to run the corresponding code at some point, so
    this isn't as unusually scary as it sounds.  But please be careful.} %
\end{boxy}

\subsubsection{Declaring type names}
\begin{grammar}
<typename-definition> ::=
  "typename" <list>$[\mbox{@<identifier>}]$ ";"
\end{grammar}

Each @<identifier> is declared as naming a C type.  This is important because
the C type syntax -- which Sod uses -- is ambiguous, and disambiguation is
done by distinguishing type names from other identifiers.

Don't declare class names using @"typename"; use @"class" forward
declarations instead.


\subsection{Literal code} \label{sec:syntax.module.literal}

\begin{grammar}
<code-definition> ::=
  "code" <identifier> ":" <item-name> @[<constraints>@]
  "{" <c-fragment> "}"

<constraints> ::= "[" <list>$[\mbox{@<constraint>}]$ "]"

<constraint> ::= @<item-name>^+

<item-name> ::= <identifier> @! "(" @<identifier>^+ ")"
\end{grammar}

The @<c-fragment> will be output unchanged to one of the output files.

The first @<identifier> is the symbolic name of an output file.  Predefined
output file names are @|c| and @|h|, which are the implementation code and
header file respectively; other output files can be defined by extensions.

Output items are named with a sequence of identifiers, separated by
whitespace, and enclosed in parentheses.  As an abbreviation, a name
consisting of a single identifier may be written as just that identifier,
without the parentheses.

The @<constraints> provide a means for specifying where in the output file
the output item should appear.  (Note the two kinds of square brackets shown
in the syntax: square brackets must appear around the constraints if they are
present, but that they may be omitted.)  Each comma-separated @<constraint>
is a sequence of names of output items, and indicates that the output items
must appear in the order given -- though the translator is free to insert
additional items in between them.  (The particular output items needn't be
defined already -- indeed, they needn't be defined ever.)

There is a predefined output item @|includes| in both the @|c| and @|h|
output files which is a suitable place for inserting @|\#include|
preprocessor directives in order to declare types and functions for use
elsewhere in the generated output files.


\subsection{Class definitions} \label{sec:syntax.module.class}

\begin{grammar}
<class-definition> ::= <class-forward-declaration>
\alt <full-class-definition>
\end{grammar}

\subsubsection{Forward declarations}
\begin{grammar}
<class-forward-declaration> ::= "class" <identifier> ";"
\end{grammar}

A @<class-forward-declaration> informs Sod that an @<identifier> will be used
to name a class which is currently undefined.  Forward declarations are
necessary in order to resolve certain kinds of circularity.  For example,
\begin{prog}
class Sub;                                                      \\+

class Super: SodObject \{                                       \\ \ind
  Sub *sub;                                                   \-\\
\};                                                             \\+

class Sub: Super \{                                             \\ \ind
  /* \dots\ */                                                \-\\
\};
\end{prog}

\subsubsection{Full class definitions}
\begin{grammar}
<full-class-definition> ::=
  @[<properties>@]
  "class" <identifier> ":" <list>$[\mbox{@<identifier>}]$
  "{" @<properties-class-item>^* "}"

<properties-class-item> ::= @[<properties>@] <class-item>

<class-item> ::= <slot-item>
\alt <initializer-item>
\alt <initarg-item>
\alt <fragment-item>
\alt <message-item>
\alt <method-item>
\alt <other-item> \fixme{undefined}
\end{grammar}

A full class definition provides a complete description of a class.

The first @<identifier> gives the name of the class.  It is an error to
give the name of an existing class (other than a forward-referenced class),
or an existing type name.  It is conventional to give classes `MixedCase'
names, to distinguish them from other kinds of identifiers.

The @<list>$[\mbox{@<identifier>}]$ names the direct superclasses for the new
class.  It is an error if any of these @<identifier>s does not name a defined
class.  The superclass list is required, and must not be empty; listing
@|SodObject| as your class's superclass is a good choice if nothing else
seems suitable.  A class with no direct superclasses is called a \emph{root
class}.  It is not possible to define a root class in the Sod language: you
must use Lisp to do this, and it's quite involved.

The @<properties> provide additional information.  The standard class
properties are as follows.
\begin{description}
\item[@|lisp_class|] The name of the Lisp class to use within the translator
  to represent this class.  The property value must be an identifier; the
  default is @|sod_class|.  Extensions may define classes with additional
  behaviour, and may recognize additional class properties.
\item[@|metaclass|] The name of the Sod metaclass for this class.  In the
  generated code, a class is itself an instance of another class -- its
  \emph{metaclass}.  The metaclass defines which slots the class will have,
  which messages it will respond to, and what its behaviour will be when it
  receives them.  The property value must be an identifier naming a defined
  subclass of @|SodClass|.  The default metaclass is @|SodClass|.
  See \xref{sec:concepts.metaclasses} for more details.
\item[@|nick|] A nickname for the class, to be used to distinguish it from
  other classes in various limited contexts.  The property value must be an
  identifier; the default is constructed by forcing the class name to
  lower-case.
\end{description}

The class body consists of a sequence of @<class-item>s enclosed in braces.
These items are discussed on the following sections.

\subsubsection{Slot items}
\begin{grammar}
<slot-item> ::=
  @<declaration-specifier>^+ <list>$[\mbox{@<init-declarator>}]$ ";"

<init-declarator> ::= <simple-declarator> @["=" <initializer>@]
\end{grammar}

A @<slot-item> defines one or more slots.  All instances of the class and any
subclass will contain these slot, with the names and types given by the
@<declaration-specifiers> and the @<declarators>.  Slot declarators may not
contain dotted names.

It is not possible to declare a slot with function type: such an item is
interpreted as being a @<message-item> or @<method-item>.  Pointers to
functions are fine.

Properties:
\begin{description}
\item[@|slot_class|] A symbol naming the Lisp class to use to represent the
  direct slot.
\item[@|initarg|] An identifier naming an initialization argument which can
  be used to provide a value for the slot.  See
  \xref{sec:concepts.lifecycle.birth} for the details.
\item[@|initarg_class|] A symbol naming the Lisp class to use to represent
  the initarg.  Only permitted if @|initarg| is also set.
\end{description}

An @<initializer>, if present, is treated as if a separate
@<initializer-item> containing the slot name and initializer were present.
For example,
\begin{prog}
[nick = eg]                                                     \\
class Example: Super \{                                         \\ \ind
  int foo = 17;                                               \-\\
\};
\end{prog}
means the same as
\begin{prog}
[nick = eg]                                                     \\
class Example: Super \{                                         \\ \ind
  int foo;                                                      \\
  eg.foo = 17;                                                \-\\
\};
\end{prog}

\subsubsection{Initializer items}
\begin{grammar}
<initializer-item> ::= @["class"@] <list>$[\mbox{@<slot-initializer>}]$ ";"

<slot-initializer> ::= <dotted-name> @["=" <initializer>@]

<initializer> ::= <c-fragment>
\end{grammar}

An @<initializer-item> provides an initial value for one or more slots.  If
prefixed by @|class|, then the initial values are for class slots (i.e.,
slots of the class object itself); otherwise they are for instance slots.

The first component of the @<dotted-name> must be the nickname of one of the
class's superclasses (including itself); the second must be the name of a
slot defined in that superclass.

Properties:
\begin{description}
\item[@|initializer_class|] A symbol naming the Lisp class to use to
  represent the initializer.
\item[@|initarg|] An identifier naming an initialization argument which can
  be used to provide a value for the slot.  See
  \xref{sec:concepts.lifecycle.birth} for the details.  An initializer item
  must have either an @|initarg| property, or an initializer expression, or
  both.
\item[@|initarg_class|] A symbol naming the Lisp class to use to represent
  the initarg.  Only permitted if @|initarg| is also set.
\end{description}

Each class may define at most one initializer item with an explicit
initializer expression for a given slot.

\subsubsection{Initarg items}
\begin{grammar}
<initarg-item> ::=
  "initarg"
  @<declaration-specifier>^+
  <list>$[\mbox{@<init-declarator>}]$ ";"
\end{grammar}
Properties:
\begin{description}
\item[@|initarg_class|] A symbol naming the Lisp class to use to represent
  the initarg.
\end{description}

\subsubsection{Fragment items}
\begin{grammar}
<fragment-item> ::= <fragment-kind> "{" <c-fragment> "}"

<fragment-kind> ::= "init" | "teardown"
\end{grammar}

\subsubsection{Message items}
\begin{grammar}
<message-item> ::=
  @<declaration-specifier>^+
  <keyword-declarator>$[\mbox{@<identifier>}]$
  @[<method-body>@]
\end{grammar}
Properties:
\begin{description}
\item[@|message_class|] A symbol naming the Lisp class to use to represent
  the message.
\item[@|combination|] A keyword naming the aggregating method combination to
  use.
\item[@|most_specific|] A keyword, either @`first' or @`last', according to
  whether the most specific applicable method should be invoked first or
  last.
\end{description}

Properties for the @|custom| aggregating method combination:
\begin{description}
\item[@|retvar|] An identifier for the return value from the effective
  method.  The default is @|sod__ret|.  Only permitted if the message return
  type is not @|void|.
\item[@|valvar|] An identifier holding each return value from a direct method
  in the effective method.  The default is @|sod__val|.  Only permitted if
  the method return type (see @|methty| below) is not @|void|.
\item[@|methty|] A C type, which is the return type for direct methods of
  this message.  The default is the return type of the message.
\item[@|decls|] A code fragment containing declarations to be inserted at the
  head of the effective method body.  The default is to insert nothing.
\item[@|before|] A code fragment containing initialization to be performed at
  the beginning of the effective method body.  The default is to insert
  nothing.
\item[@|empty|] A code fragment executed if there are no primary methods;
  it should usually store a suitable (identity) value in @<retvar>.  The
  default is not to emit an effective method at all if there are no primary
  methods.
\item[@|first|] A code fragment to set the return value after calling the
  first applicable direct method.  The default is to use the @|each|
  fragment.
\item[@|each|] A code fragment to set the return value after calling a direct
  method.  If @|first| is also set, then it is used after the first direct
  method instead of this.  The default is to insert nothing, which is
  probably not what you want.
\item[@|after|] A code fragment inserted at the end of the effective method
  body.  The default is to insert nothing.
\item[@|count|] An identifier naming a variable to be declared in the
  effective method body, of type @|size_t|, holding the number of applicable
  methods.  The default is not to provide such a variable.
\end{description}

\subsubsection{Method items}
\begin{grammar}
<method-item> ::=
  @<declaration-specifier>^+
  <keyword-declarator>$[\mbox{@<dotted-name>}]$
  <method-body>

<method-body> ::= "{" <c-fragment> "}" | "extern" ";"
\end{grammar}
Properties:
\begin{description}
\item[@|method_class|] A symbol naming the Lisp class to use to represent
  the direct method.
\item[@|role|] A keyword naming the direct method's rôle.  For the built-in
  `simple' message classes, the acceptable rôle names are @|before|,
  @|after|, and @|around|.  By default, a primary method is constructed.
\end{description}

%%%----- That's all, folks --------------------------------------------------

%%% Local variables:
%%% mode: LaTeX
%%% TeX-master: "sod.tex"
%%% TeX-PDF-mode: t
%%% End: