X-Git-Url: https://git.distorted.org.uk/~mdw/sod/blobdiff_plain/1f7d590d9c7b87442c8d8b6424ed4f769d377692..1ad4b33a5c4b390d728ef15c0eb85e53b0383c50:/doc/syntax.tex diff --git a/doc/syntax.tex b/doc/syntax.tex index de85ce8..35ab100 100644 --- a/doc/syntax.tex +++ b/doc/syntax.tex @@ -7,7 +7,7 @@ %%%----- Licensing notice --------------------------------------------------- %%% -%%% This file is part of the Sensble Object Design, an object system for C. +%%% This file is part of the Sensible Object Design, an object system for C. %%% %%% SOD is free software; you can redistribute it and/or modify %%% it under the terms of the GNU General Public License as published by @@ -26,36 +26,52 @@ \chapter{Module syntax} \label{ch:syntax} %%%-------------------------------------------------------------------------- +\section{Notation} \label{sec:syntax.notation} -Fortunately, Sod is syntactically quite simple. I've used a little slightly -unusual notation in order to make the presentation easier to read. For any -nonterminal $x$: +Fortunately, Sod is syntactically quite simple. The notation is slightly +unusual in order to make the presentation shorter and easier to read. + +Anywhere a simple nonterminal name $x$ may appear in the grammar, an +\emph{indexed} nonterminal $x[a_1, \ldots, a_n]$ may also appear. On the +left-hand side of a production rule, the indices $a_1$, \ldots, $a_n$ are +variables which vary over all nonterminal and terminal symbols, and the +variables may also appear on the right-hand side in place of a nonterminal. +Such a rule stands for a family of rules, in each variable is replaced by +each possible simple nonterminal or terminal symbol. + +The letter $\epsilon$ denotes the empty nonterminal +\begin{quote} + \syntax{$\epsilon$ ::=} +\end{quote} + +The following indexed productions are used throughout the grammar, some often +enough that they deserve special notation. \begin{itemize} -\item $\epsilon$ denotes the empty nonterminal: - \begin{quote} - $\epsilon$ ::= - \end{quote} -\item @[$x$@] means an optional $x$: +\item @[$x$@] abbreviates @$[x]$, denoting an optional occurrence + of $x$: \begin{quote} - \syntax{@[$x$@] ::= $\epsilon$ @! $x$} + \syntax{@[$x$@] ::= $[x]$ ::= $\epsilon$ @! $x$} \end{quote} -\item $x^*$ means a sequence of zero or more $x$s: +\item $x^*$ abbreviates @$[x]$, denoting a sequence of zero or + more occurrences of $x$: \begin{quote} - \syntax{$x^*$ ::= $\epsilon$ @! $x^*$ $x$} + \syntax{$x^*$ ::= $[x]$ ::= + $\epsilon$ @! $[x]$ $x$} \end{quote} -\item $x^+$ means a sequence of one or more $x$s: +\item $x^+$ abbreviates @$[x]$, denoting a sequence of one or + more occurrences of $x$: \begin{quote} - \syntax{$x^+$ ::= $x$ $x^*$} + \syntax{$x^+$ ::= $[x]$ ::= $[x]$ $x$} \end{quote} -\item $x$@<-list> means a sequence of one or more $x$s separated - by commas: +\item @$[x]$ denotes a sequence of one or more occurrences of $x$ + separated by commas: \begin{quote} - \syntax{$x$<-list> ::= $x$ @! $x$<-list> "," $x$} + \syntax{$[x]$ ::= $x$ @! $[x]$ "," $x$} \end{quote} \end{itemize} -\subsection{Lexical syntax} -\label{sec:syntax.lex} +%%%-------------------------------------------------------------------------- +\section{Lexical syntax} \label{sec:syntax.lex} Whitespace and comments are discarded. The remaining characters are collected into tokens according to the following syntax. @@ -72,7 +88,8 @@ This syntax is slightly ambiguous, and is disambiguated by the \emph{maximal munch} rule: at each stage we take the longest sequence of characters which could be a token. -\subsubsection{Identifiers} \label{sec:syntax.lex.id} + +\subsection{Identifiers} \label{sec:syntax.lex.id} \begin{grammar} ::= @^* @@ -100,7 +117,8 @@ also an ambiguity (inherited from C) in the declaration syntax which is settled by distinguishing type names from other identifiers at a lexical level. -\subsubsection{String and character literals} \label{sec:syntax.lex.string} + +\subsection{String and character literals} \label{sec:syntax.lex.string} \begin{grammar} ::= "\"" @^* "\"" @@ -131,7 +149,7 @@ discouraged. \alt \alt - ::= @^* + ::= "0" | @^* ::= "0" @("b"|"B"@) @^+ @@ -152,13 +170,15 @@ Sod understands only integers, not floating-point numbers; its integer syntax goes slightly beyond C in allowing a @`0o' prefix for octal and @`0b' for binary. However, length and signedness indicators are not permitted. -\subsubsection{Punctuation} \label{sec:syntax.lex.punct} + +\subsection{Punctuation} \label{sec:syntax.lex.punct} \begin{grammar} ::= any nonalphanumeric character other than "_", "\"" or "'" \end{grammar} -\subsubsection{Comments} \label{sec:lex-comment} + +\subsection{Comments} \label{sec:syntax.lex.comment} \begin{grammar} ::= @@ -176,37 +196,35 @@ binary. However, length and signedness indicators are not permitted. ::= any character other than "*" or "/" - ::= "//" @^* + ::= "/\,/" @^* ::= a newline character ::= any character other than newline \end{grammar} -Comments are exactly as in C99: both traditional block comments `\texttt{/*} -\dots\ \texttt{*/}' and \Cplusplus-style `\texttt{//} \dots' comments are -permitted and ignored. +Comments are exactly as in C99: both traditional block comments `@|/*| \dots\ +@|*/|' and \Cplusplus-style `@|/\,/| \dots' comments are permitted and +ignored. -\subsection{Special nonterminals} -\label{sec:special-nonterminals} + +\subsection{Special nonterminals} \label{sec:syntax.lex.special} Aside from the lexical syntax presented above (\xref{sec:lexical-syntax}), two special nonterminals occur in the module syntax. -\subsubsection{S-expressions} \label{sec:syntax-sexp} - +\subsubsection{S-expressions} \begin{grammar} ::= an S-expression, as parsed by the Lisp reader \end{grammar} When an S-expression is expected, the Sod parser simply calls the host Lisp -system's \textsf{read} function. Sod modules are permitted to modify the -read table to extend the S-expression syntax. +system's @|read| function. Sod modules are permitted to modify the read +table to extend the S-expression syntax. S-expressions are self-delimiting, so no end-marker is needed. -\subsubsection{C fragments} \label{sec:syntax.lex.cfrag} - +\subsubsection{C fragments} \begin{grammar} ::= a sequence of C tokens, with matching brackets \end{grammar} @@ -220,7 +238,8 @@ determined by the immediately surrounding context -- usually a closing brace or bracket. The first such delimiter character which is not enclosed in brackets, braces or parenthesis ends the fragment. -\subsection{Module syntax} \label{sec:syntax-module} +%%%-------------------------------------------------------------------------- +\section{Module syntax} \label{sec:syntax.module} \begin{grammar} ::= @^* @@ -233,13 +252,12 @@ brackets, braces or parenthesis ends the fragment. \alt \end{grammar} -A module is the top-level syntactic item. A module consists of a sequence of -definitions. +A @ is the top-level syntactic item. A module consists of a sequence +of definitions. -\subsection{Simple definitions} \label{sec:syntax.defs} - -\subsubsection{Importing modules} \label{sec:syntax.defs.import} +\subsection{Simple definitions} \label{sec:syntax.module.simple} +\subsubsection{Importing modules} \begin{grammar} ::= "import" ";" \end{grammar} @@ -268,8 +286,7 @@ happens.\footnote{% Recursive imports, either direct or indirect, are an error. -\subsubsection{Loading extensions} \label{sec:syntax.defs.load} - +\subsubsection{Loading extensions} \begin{grammar} ::= "load" ";" \end{grammar} @@ -303,26 +320,25 @@ which will arrange for the extension to be compiled if necessary. particular system definition facility. It's bad enough already that it depends on Common Lisp.) -\subsubsection{Lisp escapes} \label{sec:syntax.defs.lisp} - +\subsubsection{Lisp escapes} \begin{grammar} ::= "lisp" ";" \end{grammar} The @ is evaluated immediately. It can do anything it likes. -\textbf{Warning!} This means that hostile Sod modules are a security hazard. -Lisp code can read and write files, start other programs, and make network -connections. Don't install Sod modules from sources that you don't -trust.\footnote{% - Presumably you were going to run the corresponding code at some point, so - this isn't as unusually scary as it sounds. But please be careful.} % - -\subsubsection{Declaring type names} \label{sec:syntax.defs.typename} +\begin{boxy}[Warning!] + This means that hostile Sod modules are a security hazard. Lisp code can + read and write files, start other programs, and make network connections. + Don't install Sod modules from sources that you don't trust.\footnote{% + Presumably you were going to run the corresponding code at some point, so + this isn't as unusually scary as it sounds. But please be careful.} % +\end{boxy} +\subsubsection{Declaring type names} \begin{grammar} ::= - "typename" ";" + "typename" $[\mbox{@}]$ ";" \end{grammar} Each @ is declared as naming a C type. This is important because @@ -332,16 +348,19 @@ done by distinguishing type names from other identifiers. Don't declare class names using @"typename"; use @"class" forward declarations instead. -\subsection{Literal code} \label{sec:syntax-code} + +\subsection{Literal code} \label{sec:syntax.module.literal} \begin{grammar} ::= - "code" ":" @[@] + "code" ":" @[@] "{" "}" - ::= "[" "]" + ::= "[" $[\mbox{@}]$ "]" + + ::= @^+ - ::= @^+ + ::= @! "(" @^+ ")" \end{grammar} The @ will be output unchanged to one of the output files. @@ -350,28 +369,29 @@ The first @ is the symbolic name of an output file. Predefined output file names are @"c" and @"h", which are the implementation code and header file respectively; other output files can be defined by extensions. -The second @ provides a name for the output item. Several C -fragments can have the same name: they will be concatenated together in the -order in which they were encountered. +Output items are named with a sequence of identifiers, separated by +whitespace, and enclosed in parentheses. As an abbreviation, a name +consisting of a single identifier may be written as just that identifier, +without the parentheses. The @ provide a means for specifying where in the output file the output item should appear. (Note the two kinds of square brackets shown in the syntax: square brackets must appear around the constraints if they are present, but that they may be omitted.) Each comma-separated @ -is a sequence of identifiers naming output items, and indicates that the -output items must appear in the order given -- though the translator is free -to insert additional items in between them. (The particular output items -needn't be defined already -- indeed, they needn't be defined ever.) +is a sequence of names of output items, and indicates that the output items +must appear in the order given -- though the translator is free to insert +additional items in between them. (The particular output items needn't be +defined already -- indeed, they needn't be defined ever.) There is a predefined output item @"includes" in both the @"c" and @"h" output files which is a suitable place for inserting @"\#include" preprocessor directives in order to declare types and functions for use elsewhere in the generated output files. -\subsection{Property sets} \label{sec:syntax.propset} +\subsection{Property sets} \label{sec:syntax.module.properties} \begin{grammar} - ::= "[" "]" + ::= "[" $[\mbox{@}]$ "]" ::= "=" \end{grammar} @@ -385,17 +405,17 @@ A property has a name, given as an @, and a value computed by evaluating an @. The value can be one of a number of types, though the only operators currently defined act on integer values only. -\subsubsection{The expression evaluator} \label{sec:syntax.propset.expr} - +\subsubsection{The expression evaluator} \begin{grammar} - ::= | "+" | "-" + ::= | "+" | "--" ::= | "*" | "/" - ::= | "+" | "-" + ::= | "+" | "--" ::= | | | +\alt "<" ">" \alt "?" \alt "(" ")" \end{grammar} @@ -413,7 +433,8 @@ causes the @ to be evaluated using the Lisp \textsf{eval} function. %%% FIXME crossref to extension docs -\subsection{C types} \label{sec:syntax.c-types} + +\subsection{C types} \label{sec:syntax.module.types} Sod's syntax for C types closely mirrors the standard C syntax. A C type has two parts: a sequence of @s and a @. In @@ -421,31 +442,55 @@ Sod, a type must contain at least one @ (i.e., `implicit @"int"' is forbidden), and storage-class specifiers are not recognized. -\subsubsection{Declaration specifiers} \label{sec:syntax.c-types.declspec} - +\subsubsection{Declaration specifiers} \begin{grammar} ::= \alt "struct" | "union" | "enum" \alt "void" | "char" | "int" | "float" | "double" \alt "short" | "long" \alt "signed" | "unsigned" +\alt "bool" | "_Bool" +\alt "imaginary" | "_Imaginary" | "complex" | "_Complex" \alt +\alt +\alt + + ::= | "const" | "volatile" | "restrict" + + ::= @^+ + + ::= + "(" ")" - ::= "const" | "volatile" | "restrict" + ::= "atomic" | "_Atomic" + + ::= "(" ")" + + ::= "alignas" "_Alignas" ::= \end{grammar} A @ is an identifier which has been declared as being a type name, -using the @"typename" or @"class" definitions. +using the @"typename" or @"class" definitions. The following type names are +defined in the built-in module. +\begin{itemize} +\item @"va_list" +\item @"size_t" +\item @"ptrdiff_t" +\item @"wchar_t" +\end{itemize} Declaration specifiers may appear in any order. However, not all combinations are permitted. A declaration specifier must consist of zero or -more @, and one of the following, up to reordering. +more @s, zero or more @s, and one of the +following, up to reordering. \begin{itemize} \item @ +\item @ \item @"struct" @, @"union" @, @"enum" @ \item @"void" +\item @"_Bool", @"bool" \item @"char", @"unsigned char", @"signed char" \item @"short", @"unsigned short", @"signed short" \item @"short int", @"unsigned short int", @"signed short int" @@ -455,35 +500,40 @@ more @, and one of the following, up to reordering. \item @"long long", @"unsigned long long", @"signed long long" \item @"long long int", @"unsigned long long int", @"signed long long int" \item @"float", @"double", @"long double" +\item @"float _Imaginary", @"double _Imaginary", @"long double _Imaginary" +\item @"float imaginary", @"double imaginary", @"long double imaginary" +\item @"float _Complex", @"double _Complex", @"long double _Complex" +\item @"float complex", @"double complex", @"long double complex" \end{itemize} All of these have their usual C meanings. -\subsubsection{Declarators} \label{sec:syntax.c-types.declarator} - +\subsubsection{Declarators} \begin{grammar} -$[k]$ ::= @^* $[k]$ +$[k, a]$ ::= @^* $[k, a]$ -$[k]$ ::= $k$ -\alt "(" $[k]$ ")" -\alt $[k]$ @^* +$[k, a]$ ::= $k$ +\alt "(" $[k, a]$ ")" +\alt $[k, a]$ @$[a]$ ::= "*" @^* - ::= "[" "]" -\alt "(" ")" +$[a]$ ::= "[" "]" +\alt "(" $a$ ")" - ::= $\epsilon$ | "..." -\alt @["," "..."@] + ::= $\epsilon$ | "\dots" +\alt $[\mbox{@}]$ @["," "\dots"@] ::= @^+ - ::= @[ @! $\epsilon$@] + ::= $[\epsilon, \mbox{@}]$ - ::= @[@] + ::= $[\mbox{@ @! $\epsilon$}]$ - ::= "." + ::= + $[\mbox{@ @! $\epsilon$}, \mbox{@}]$ - ::= @[@] + ::= + $[\mbox{@}, \mbox{@}]$ \end{grammar} The declarator syntax is taken from C, but with some differences. @@ -498,15 +548,33 @@ The declarator syntax is taken from C, but with some differences. The remaining differences are (I hope) a matter of presentation rather than substance. -\subsection{Defining classes} \label{sec:syntax.class} +There is additional syntax to support messages and methods which accept +keyword arguments. + +\begin{grammar} + ::= @["=" @] + + ::= + @[$[\mbox{@}]$@] + "?" @[$[\mbox{@}]$@] + + ::= @! + + ::= "." + +$[k]$ ::= + $[k, \mbox{@}]$ +\end{grammar} + + +\subsection{Class definitions} \label{sec:syntax.module.class} \begin{grammar} ::= \alt \end{grammar} -\subsubsection{Forward declarations} \label{sec:class.class.forward} - +\subsubsection{Forward declarations} \begin{grammar} ::= "class" ";" \end{grammar} @@ -514,30 +582,33 @@ substance. A @ informs Sod that an @ will be used to name a class which is currently undefined. Forward declarations are necessary in order to resolve certain kinds of circularity. For example, -\begin{listing} -class Sub; - -class Super : SodObject { - Sub *sub; -}; +\begin{prog} +class Sub; \\+ -class Sub : Super { - /* ... */ -}; -\end{listing} +class Super : SodObject \{ \\ \ind + Sub *sub; \-\\ +\}; \\+ -\subsubsection{Full class definitions} \label{sec:class.class.full} +class Sub : Super \{ \\ \ind + /* \dots\ */ \-\\ +\}; +\end{prog} +\subsubsection{Full class definitions} \begin{grammar} ::= @[@] - "class" ":" - "{" @^* "}" + "class" ":" $[\mbox{@}]$ + "{" @^* "}" - ::= ";" + ::= @[@] + + ::= +\alt +\alt +\alt \alt \alt -\alt ";" \end{grammar} A full class definition provides a complete description of a class. @@ -547,8 +618,12 @@ give the name of an existing class (other than a forward-referenced class), or an existing type name. It is conventional to give classes `MixedCase' names, to distinguish them from other kinds of identifiers. -The @ names the direct superclasses for the new class. It -is an error if any of these @s does not name a defined class. +The @$[\mbox{@}]$ names the direct superclasses for the new +class. It is an error if any of these @s does not name a defined +class. The superclass list is required, and must not be empty; listing +@|SodObject| as your class's superclass is a good choice if nothing else +seems suitable. It's not possible to define a \emph{root class} in the Sod +language: you must use Lisp to do this, and it's quite involved. The @ provide additional information. The standard class properties are as follows. @@ -573,20 +648,18 @@ properties are as follows. The class body consists of a sequence of @s enclosed in braces. These items are discussed on the following sections. -\subsubsection{Slot items} \label{sec:sntax.class.slot} - +\subsubsection{Slot items} \begin{grammar} ::= - @[@] - @^+ + @^+ $[\mbox{@}]$ ";" - ::= @["=" @] + ::= @["=" @] \end{grammar} A @ defines one or more slots. All instances of the class and any subclass will contain these slot, with the names and types given by the @ and the @. Slot declarators may not -contain qualified identifiers. +contain dotted names. It is not possible to declare a slot with function type: such an item is interpreted as being a @ or @. Pointers to @@ -595,68 +668,79 @@ functions are fine. An @, if present, is treated as if a separate @ containing the slot name and initializer were present. For example, -\begin{listing} -[nick = eg] -class Example : Super { - int foo = 17; -}; -\end{listing} +\begin{prog} +[nick = eg] \\ +class Example : Super \{ \\ \ind + int foo = 17; \-\\ +\}; +\end{prog} means the same as -\begin{listing} -[nick = eg] -class Example : Super { - int foo; - eg.foo = 17; -}; -\end{listing} - -\subsubsection{Initializer items} \label{sec:syntax.class.init} - +\begin{prog} +[nick = eg] \\ +class Example : Super \{ \\ \ind + int foo; \\ + eg.foo = 17; \-\\ +\}; +\end{prog} + +\subsubsection{Initializer items} \begin{grammar} - ::= @["class"@] + ::= @["class"@] $[\mbox{@}]$ ";" - ::= "=" + ::= @["=" @] - :: "{" "}" | + :: \end{grammar} An @ provides an initial value for one or more slots. If prefixed by @"class", then the initial values are for class slots (i.e., slots of the class object itself); otherwise they are for instance slots. -The first component of the @ must be the nickname of -one of the class's superclasses (including itself); the second must be the -name of a slot defined in that superclass. +The first component of the @ must be the nickname of one of the +class's superclasses (including itself); the second must be the name of a +slot defined in that superclass. -The initializer has one of two forms. -\begin{itemize} -\item A @ enclosed in braces denotes an aggregate initializer. - This is suitable for initializing structure, union or array slots. -\item A @ \emph{not} beginning with an open brace is a `bare' - initializer, and continues until the next @`,' or @`;' which is not within - nested brackets. Bare initializers are suitable for initializing scalar - slots, such as pointers or integers, and strings. -\end{itemize} +An @|initarg| property may be set on an instance slot initializer (or a +direct slot definition). See \xref{sec:concepts.lifecycle.birth} for the +details. An initializer item must have either an @|initarg| property, or an +initializer expression, or both. -\subsubsection{Message items} \label{sec:syntax.class.message} +Each class may define at most one initializer item with an explicit +initializer expression for a given slot. +\subsubsection{Initarg items} \begin{grammar} - ::= - @[@] - @^+ @[@] + ::= + "initarg" + @^+ + $[\mbox{@}]$ ";" +\end{grammar} + +\subsubsection{Fragment items} +\begin{grammar} + ::= "{" "}" + + ::= "init" | "teardown" \end{grammar} -\subsubsection{Method items} \label{sec:syntax.class.method} +\subsubsection{Message items} +\begin{grammar} + ::= + @^+ + $[\mbox{@}]$ + @[@] +\end{grammar} +\subsubsection{Method items} \begin{grammar} ::= - @[@] - @^+ + @^+ + $[\mbox{@}]$ + ::= "{" "}" | "extern" ";" \end{grammar} - %%%----- That's all, folks -------------------------------------------------- %%% Local variables: